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ABSTRACT 


This  report  is  concerned  with  the  implementation  of  an  on-line 
information  storage  and  retrieval  system  for  the  Rome  Air  Development 
Center.  This  system  is  to  incorporate  techniques  of  automatic  document 
classification  for  a  large  document  collection  in  an  interactive  environment. 
Following  a  review  of  the  system  design,  the  implementation  of  the  system 
executive  is  described  in  detail.  Because  this  executive  program  also  governs 
communications  between  the  user  and  the  system,  it  must  be  a  communica¬ 
tions  package,  a  training  aid,  a  file  building  program  and  an  executive  pro¬ 
gram  all  in  one. 
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SECTION  I 


INTRODUCTION 

This  document  is  the  final  report  on  the  development  of  an  on¬ 
line  information  storage  and  retrieval  system  for  the  Rome  Air  Development 
Center,  Air  Force  Systems  Command,  Griffiss  Air  Force  Base,  New  York. 
Under  this  contract,  an  on-line  storage  and  retrieval  system,  called  for 
brevity  in  this  report  the  On-Line  System,  has  been  designed,  and  its  execu¬ 
tive,  called  in  this  report  the  dialogue  processor,  has  been  programmed. 

The  dialogue  processor  has  been  provided  with  routines  to  simulate  the  rest 
of  the  On-Line  System,  so  that  to  the  user,  the  entire  System  appears  to  be 
implemented. 

An  overview  of  the  design  of  the  On-Line  System  is  first  pre¬ 
sented.  along  with  a  summary  of  the  present  status  of  the  dialogue  processor 
and  its  supporting  programs.  The  operation  of  the  dialogue  processor  is 
then  described  in  detail,  and  two  examples  of  actual  user  dialogues  with  the 
dialogue  processor  are  presented  to  illustrate  the  discussion.  Construction 
techniques  for  the  files  it  accesses  are  .  resented.  A  description  of  the  sub¬ 
programs  that  mat  *  up  the  dialogue  processor  and  a  discussion  of  useful 
areas  for  further  work  conclude  the  report. 
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SECTION  II 


CURRENT  STATUS  OF  THE  SYSTEM 

This  section  presents  the  current  status  of  the  On-Line  System. 
An  overview  of  the  design  of  the  System  is  presented,  including  a  discussion 
of  the  user  and  System  actions  that  take  place  during  a  query  sequence.  The 
section  concludes  by  discussing  in  detail  the  present  status  of  both  the  Sys¬ 
tem  design  and  the  dialogue  processor. 

II.  1  HISTORICAL  BACKGROUND  AND  INTRODUCTION 

The  computer  is  a  potentially  powerful  tool  for  browsing  through 
vast  quantities  of  information.  The  speed  and  storage  capacity  of  modern 
computer  systems  promise  to  make  the  resources  of  a  library  avail¬ 
able  without  the  huge  investment  of  time  required  to  establish,  maintain, 
and  use  the  manual  searching  aids  usually  associated  with  a  library. 

Much  work  has  been  done  in  the  development  of  on-line  systems, 
and  additional  background  on  other  systems  is  provided  in  other  papers 
(1,  2,  3,  4,  6,  7,  9).  However,  all  of  these  systems  retrieve  by  means  of 
simple  coordinate  indexing  and  various  embellishments  on  it.  Only  one  type 
of  ranking  exists  in  such  systems--the  identification  of  relative  relevance 
of  retrieved  documents.  It  is  obtained  from  a  tally  ox  the  number  of  ele¬ 
ments  of  "or"  clauses  retrieving  each  document.  This  form  of  ranking  is 
crude  in  that  it  does  not  give  a  very  sensitive  measure  of  relevance. 

Most  such  systems  rely  on  manual  indexing  of  documents  and 
retrieve  on  descriptors.  Some  allow  the  use  of  an  on-line  thesaurus;  some 
also  allow  retrieval  on  title  or  author's  name.  A  few  allow  searching  on 
partial  words  and  word  phrases.  Thus,  most  present  on-line  systems  rely 
on  manual  indexing  (except  for  title  and  author  information)  and  perform 
retrieval  based  on  logical  connectives.  Except  for  provision  of  thesauri, 
little  is  done  to  assist  the  user's  synonym  problems. 
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The  size  of  collections  presently  being  encountered,  their  growth 
rate,  the  scarcity  of  competent  indexing  personnel,  and  the  cost  of  manual 
indexing  have  promoted  the  search  for  automated  methods  of  document  class¬ 
ification  for  retrieval  (8).  This  includes  not  only  the  actual  indexing  of 
documents,  but  the  development  of  thesauri.  Naturally,  the  development  of 
new  retrieval  techniques  is  intimately  linked  with  work  in  classification 
methods. 

n.  2  OBJECTIVES  OF  THE  RADC  ON-LINE  RETRIEVAL  SYSTEM 

The  RADC  On-Line  Retrieval  System  is  an  attempt  to  overcome 
many  of  the  difficulties  that  have  been  associated  with  information  storage 
and  retrieval  systems  through  the  use  of  a  new  approach  to  the  problem  of 
on-line  retrieval- -the  conefept  vector  technique.  This  System  is  to  enable 
classification  of  documents  to  be  performed  automatically;  the  correlation 
that  can  be  performed  on  documents  indexed  by  concept  vectors  should  be 
far  superior  to  that  which  can  be  achieved  with  coordinate  indexing.  The 
use  of  the  concept  vector  technique  will  also  permit  retrieval  based  on 
similarity  to  any  specified  document  in  the  collection. 

Concept  vector  indexing  is  performed  in  batch  mode  for  use 
within  the  framework  of  a  fully  interactive  on-line  system.  The  on-line 
nature  of  the  retrieval  System  operation  imposes  fundamental  constraints 
on  the  entire  System  design,  if  the  result  is  to  be  useful. 

Batch  system  queries  are  frequently  written  by  a  system  "expert" 
who 'interprets  the  information  requests  submitted  by  users.  How¬ 
ever,  in  interactive  systems,  the  user  himself  formulates  queries  and 
operates  the  system.  Therefore,  for  successful  operation,  the  on-line 
dialogue  must  be  easy  and  natural  to  use. 

A  user  must  be  able  to  concentrate  on  the  problems  of  retrieving 
information,  and  not  be  required  to  second-guess  the  designers  of  the  System. 
Users  with  differing  levels  of  familiarity  are  to  be  expected.  The  inexper¬ 
ienced  user  must  be  led  through  the  System  step-by-step,  whereas  the 
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experienced  user  should  be  able  to  exercise  a  great  deal  of  flexibility  in 
employing  the  System.  The  messages  from  the  experienced  user  to  the 
System  would  be  expected  to  be  terse,  whereas  those  from  the  neophyte 
would  be  more  verbose  and  tutorial. 

Consider  the  additional  power  of  an  on-line  system  if  the  user 
is  given  the  ability  to  locate  documents  that  are  in  some  way  like  a  known 
document.  This  can  be  illustrated  by  considering  the  problem  of  finding 
documents  in  the  stacks  of  a  conventional  library.  Suppose  that  one  could 
only  request  documents  by  their  classification  numbers  and  that  one  were 
not  allowed  to  enter  the  stacks.  Depending  on  the  user's  familiarity  with 
the  classification  system  and  with  the  document  collection,  he  might  or 
might  not  be  able  to  retrieve  all  of  the  documents  relevant  to  his  needs. 

Now,  if  the  user  can  enter  the  stacks  of  the  library  and  browse 
about,  his  chances  of  finding  useful  documents  are  increased.  They  are 
likely  to  be  physically  near  the  documents  specified  initially  (and,  of  course, 
may  include  those  documents).  The  hierarchical  classification  scheme  of 
the  library  has  been  mapped  into  one -dimensional  space:  the  ordering  of  the 
books  in  the  stacks.  An  on-line  system  can  be  built  in  such  a  way  that  the 
user  is  free  from  the  constraints  of  a  space  of  limited  dimensionality  and 
can  search  for  documents  "like"  a  given  document.  This  is  known  as 
document-document  searching  and  is  analogous  to  browsing  in  a  library 
where  every  intellectual  area  (or  "concept")  corresponds  to  a  different 
dimension. 

A  further  novelty  of  the  RADC  On-Line  System  concerns  the  size 
of  the  data  base  to  be  indexed  and  accessed;  it  will  eventually  contain  more 
than  100  million  characters  of  text.  The  size  of  the  data  base  presents 
particuiar  problems  in  the  design  of  the  off-line  programs  that  perform 
indexing.  The  indexing  processes  must  be  designed  to  avoid  rapid  growth 
of  core  requirements  as  the  data  base  size  increases.  For  example,  where¬ 
as  storage  of  a  similarity  matrix  for  100  documents  requires  10,  000  similar¬ 
ity  coefficients  to  be  computed  and  stored,  the  same  matrix  for  1,  000 


documents  would  have  1,  000,  000  elements.  Thus,  many  processes  that  are 
useful  for  small  document  collections  simply  cannot  be  used  with  a  large 
data  base. 


n.3  AN  OVERVIEW  OF  THE  ON-LINE  SYSTEM 

This  subsection  presents  a  highlight  of  the  most  significant 
design  features  of  the  RADC  On-Line  Retrieval  System.  It  is  noted  that 
the  present  System  does  not  include  all  the  designed  features. 

Q.  3.1  Indexing  and  Retrieval  Sequences 

Although  the  On-Line  System  operates  on  concept  vectors,  it 
must  use  a  thesaurus.  The  thesaurus  contains  word  stems  rather  than 
words,  and  is  automatically  developed  from  the  document  file.  First, 
common  words  (e.  g.  a,  an,  the)  are  removed  and  stem  analysis  is  employed 
in  order  to  select  the  distinct  noncommon  stems  occurring  in  the  document 
collection.  This  large  group  of  stems  is  reduced  to  a  smaller  collection  of 
so-called  content  stems,  which  constitutes  the  thesaurus.  This  selection  of 
the  content  stems  from  the  collection  of  raw  stems  is  to  be  performed  by 
the  statistical  filtering  program,  which  selects  those  word  stems  most 
promising  for  the  characterization  of  documents.  It  does  this  by  analysis 
of  both  the  stem  rank-frequency  distribution  and  the  variation  of  that  distri¬ 
bution  over  the  document  collection. 

With  every  document  is  associated  its  concept  vector.  This 
vector  consists  of  concept-weight  pairs.  A  concept  vector  can  be  formed 
from  any  body  of  text;  therefore,  in  order  to  perform  a  retrieval  query,  it 
is  only  necessary  to  derive  the  concept  vector  for  the  query  and  correlate 
it  with  concept  vectors  for  the  documents  in  the  collection.  Those  docu¬ 
ments  with  vectors  producing  the  highest  correlation  are  then  retrieved. 
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The  concepts  themselves  could,  of  course,  be  word  stems. 
However,  this  would  not  allow  the  System  to  account  for  the  use  of  words 
that  are  similar  in  meaning,  and  would,  introduce  one  of  the  worst  draw¬ 
backs  of  simple  coordinate  indexing- -the  need  for  the  user  of  the  System  to 
consult  a  thesaurus  of  "use"  and  "used  for"  terms.  Instead  of  this  stam¬ 
per-concept  approach,  the  System  is  tO'duster  stems  into  about  1500  groups. 
Each  group  contains  stems  of  similar  semantic  value,  and  each  grc  ip  corres¬ 
ponds  to  a  concept.  The  clustering  is  to  be  performed  on  a  basis  of  statisti¬ 
cal  stem  co-occurrence  analysis. 

An  ordinary  retrieval  on  the  basis  of  a  text  query  is  performed 
in  the  following  manner.  First,  the  user's  request  is  processed  by  the  rou¬ 
tines  which  reject  common  words  and  perform  stem  analysis,  reducing  the 
query  to  a  sequence  of  stems.  Each  concept  stem  is  then  mapped  by  a 
dictionary  processor  into  one  or  more  clusters.  Since  each  cluster  is 
associated  with  a  concept,  this  process  produces  the  concept  vector  corres¬ 
ponding  to  the  query.  This  vector  can  be  correlated  against  the  concept 
vectors  for  the  document  collection  in  order  to  perform  the  retrieval.  In 
order  to  avoid  comparison  with  all  the  concept  vectors  for  a  large  collection, 
say,  40,  000  documents,  the  document  concept  vectors  themselves  are 
clustered  about  centroids.  This  materially  reduces  the  search  time. 

As  mentioned  in  the  last  section,  document-document  correlation 
can  also  be  performed  by  the  On-Line  System.  This  form  of  searching 
simply  employs  the  concept  vector  of  a  known  document  in  order  to  retrieve 
similar  documents.  (It  is  also  possible  for  the  user  to  construct  and  modify 
query  concept  vectors  directly,  working  only  with  numeric  concept  codes 
and  weights.  ) 

During  the  retrieval  process,  the  user  can  be  expected  to  try  a 
number  of  queries.  Some  will  retrieve  desirable  documents,  and  some  will 
not.  The  user  is  given  the  capability  to  build  a  file  of  documents,  retaining 
those  which  he  finds  desirable. 
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II.  3.  2 


Structure  of  the  On-Line  System 


Figure  II-l  shows  the  overall  structure  of  the  On-Line  System. 
Files  are  represented  by  symbols  with  rounded  sides;  rectangles  represent 
programs.  An  arrow  from  A  to  B  indicates  that  A  calls  B,  if  A  and  B  are 
programs.  If  B  is  a  file  and  A  is  a  program,  the  arrow  indicates  that  A 
writes  on  B;  if  A  is  a  file  and  B  is  a  program,  then  B  reads  from  A. 

The  dialogue  program  keeps  track  of  the  status  of  the  present 
query  sequence  by  maintaining  the  query  sequence  status  file.  Because  this 
file  contains  the  information  needed  to  direct  the  operation  of  the  other  pro¬ 
grams  in  the  On-Line  System,  the  dialogue  program  performs  the  executive 
function  and  is  resident  in  core  at  all  times  while  the  on-line  system  is  in 
operation.  For  this  reason,  the  core  requirements  of  the  dialogue  program 
must  be  minimized;  therefore,  the  only  file  that  DIALOGUE  will  keep  in 
core  is  the  query  sequence  status  file  which  will  contain  the  current  query 
words,  stems,  concept  numbers,  weights,  and  various  flags  that  specify 
the  status  of  the  query. 

The  four  program  modules  that  are  loaded  into  core  by  the  dia¬ 
logue  program  are  shown  in  Figure  13-1  as  the  four  blocks  immediately 
below  the  dialogue  program.  Each  of  the  program  modules  will  be  loaded 
with  the  subprograms  that  it  calls.  With  one  exception,  CHOOSE,  only  one 
of  the  four  program  modules  will  be  resident  in  core  at  once. 

H.3.2.1  Files.  The  entities  shown  as  files  in  Figure  II-l  are  not  neces¬ 
sarily  distinct  files  that  will  be  stored  on  auxiliary  storage  devices;  rather, 
every  sizable  data  structure  is  identified  here  as  a  file  so  that  an  explicit 
decision  concerning  its  residence  can  be  made. 

The  four  files  shown  on  the  left  margin  of  Figure  II-l  are 
arranged  hierarchically  in  order  of  increasing  minimum  access  time  require 
ments.  Exactly  which  file  is  resident  on  what  type  of  auxiliary  storage  devic 
is  a' decision  to  be  based  upon  both  the  amount  of  auxiliary  storage  available 
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DIALOGUE 


and  response  time  requirements.  For  the  System  to  interact  conversationally 
with  the  user,  at  least  the  cluster  centroids  and  concept  vectors  must  be 
stored  on  a  high-speed,  direct-access  storage  device.  The  document  file 
can  be  allocated  to  tape  or  disk  storage. 

The  file  structure  has  been  designed  to  accommodate  the  widest 
possible  variation  in  data  base  characteristics.  The  main  contributor  to 
this  flexibility  is  the  use  of  variable -length  records  in  every  file.  This  not 
only  removes  the  need  for  some  arbitrary  limit  on  the  size  of  each  type 
record,  it  also  greatly  increases  the  efficiency  with  which  the  available  disk 
storage  space  is  used,  because  every  record  will  occupy  only  the  amount  of  , 
space  it  requires. 

The  on-line  files  that  will  be  accessed  by  program  module 

FETCH  are: 

1.  Documents 

2.  Bibliographic  data 

3.  Concept  vectors 

4.  Centroids. 

Before  the  on-line  system  can  be  used,  these  data  must  be  loaded  into  four 
distinct  GECOS  III  permanent  files  by  SHOVEL.  Four  separate  files  are 
used  in  order  to  permit  all  the  records  that  are  associated  with  a  given 
document  in  the  data  base  to  be  obtained  by  using  only  the  accession  num¬ 
ber  of  the  generating  document.  Because  of  this,  no  separate  directory 
will  be  necessary,  and  cross-referencing  from  a  concept  vector  to  a  biblio¬ 
graphic  record  to  the  document  itself  can  be  performed  without  intermediate 
accesses  to  a  directory. 

Figure  II- 2  illustrates  the  organization  of  the  on-line  files. 

The  solid  arrows  represent  an  explicit  "pointing11  relationship;  the  dashed 
arrows  represent  an  implicit  "pointing"  relationship  that  arises  because 
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CENTROID  FILE  CONCEPT  VECTOR  FILE  BIBLIOGRAPHIC  FILE  DOCUMENT  FILE 


Figure  II-2  On-Line  File  Structures 


the  concept  vector,  bibliographic,  and  document  files  are  all  ordered  Dy 
accession  number.  Thus,  the  arrows  indicate  all  the  possible  methods  of 
cross-referencing  the  various  files. 

In  order  that  the  files  can  be  organized  efficiently  by  accession 
number,  it  is  required  that  the  accession  numbers  be  a  compact  set  of  posi¬ 
tive  integers  starting  with  one.  If  the  data  base  is  supplied  without  these 
integral  accession  numbers,  it  is  a  simple  matter  to  number  all  the  docu¬ 
ments. 


The  use  of  a  distinct  file  for  each  class  of  data  also  permits  the 
selective  loading  of  the  various  files.  The  On-Line  System  might  be  used 
for  experiments  that  would  not  access  all  of  the  files.  In  this  case,  the 
selective  loading  of  the  on-line  files,  by  reducing  disk  usage,  will  increase 
operational  economy  beyond  that  which  might  otherwise  be  associated  with 
experimentation  with  the  full  On-Line  System. 

2.  3.2.  2  Program  Modules.  This  subsection  introduces  each  program 
module  and  gives  a  brief  description  of  its  function. 

The  primary  function  of  DIALOGUE  is  to  keep  track  of  the  user's 
status  and  direct  user-system  interaction.  Therefore,  DIALOGUE  maintains 
the  information  needed  to  direct  the  sequence  of  operation  of  the  other  pro¬ 
gram  modules  and  also  serves  as  the  executive  of  the  On-Line  Retrieval 
System. 


Program  module  FETCH  performs  all  accesses  to  the  on-line 
data  base.  Given  a  record  number  and  a  file  designation,  FETCH  returns 
the  record  and  size  of  the  record.  FETCH  obtains  only  one  record  at  a 
time;  to  obtain  all  the  records  in  a  file,  FETCH  must  be  called  repeatedly. 
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FETCH  will  be  loaded  by  itself  or  together  with  CHOOSE, 

FETCH  will  be  loaded  by  itself  when  a  file  access  is  being  performed  that 
does  not  require  selection  of  concept  vectors  based  on  their  correlation 
with  some  query  vector,  such  as  when  scanning  of  the  bibliographic  data 

or  document  file  is  taking  place. 

\ 

Program  module  CHOOSE,  given  a  query  vector  by  DIALOGUE, 
returns  to  DIALOGUE  the  accession  numbers  of  the  documents  whose  con¬ 
cept  vectors  have  the  highest  correlation  coefficients  with  the  query  vector. 

In  order  to  do  this,  CHOOSE  calls  FETCH  to  obtain  the  centroids  of  all 
clusters,  and  then  calls  CORRELATE  to  determine  which  clusters  to  scan. 
When  this  is  complete,  FETCH  is  called  to  obtain  the  selected  clusters, 
and  the  concept  vectors  in  these  clusters  are  similarly  processed  by 
CORRELATE.  CHOOSE  then  returns  to  DIALOGUE  the  accession  numbers 
of  the  documents  whose  concept  vectors  correlate  most  highly  with  the  query. 

Program  module  BUILD  operates  on  a  list  of  words  and  produces 
a  concept  vector.  It  does  this  by  first  performing  stem  analysis  by  calling 
STEMS,  then  mapping  the  stems  into  concepts  by  calling  CONCEPTS.  Pro¬ 
gram  STEMS  includes  within  it  the  list  of  common  words  and  the  list  of 
stems  to  be  removed;  program  CONCEPTS  includes  within  it  the  dictionary 
of  content  stems  and  the  concept  numbers  and  weights  into  which  each  is 
mapped. 


Each  word  in  a  query  can  fall  into  one  of  three  categories.  It 
may  be  a  common  word  that  is  deleted  by  STEMS,  a  word  that  generates  a 
noncontent  stem,  and,  therefore,  is  not  mapped  into  a  concept,  or  a  word 
that  generates  a  content  stem,  and,  therefore,  is  mapped  into  one  or  more 
concepts.  BUILD  will  recognize  and  differentiate  between  these  three  cases 
and  report  this  information  to  DIALOGUE  along  with  the  generated  concept 
vector  and  stems. 


BUILD  will  be  called  to  process  a  query  before  calling  CHOOSE. 
When  document-document  correlation  i3  being  Derformed,  BUILD  will  not 
be  used,  since  the  query  vector  in  that  case  will  be  obtained  by  using  FETCH 
to  access  the  concept  vector  file. 

Program  module  CHAT  communicates  with  the  user.  Standard 
On-Line  System  messages  are  sent  to  the  user  by  calling  SELECT.  Given 
a  message  number,  SELECT  accesses  the  file  of  messages,  selects  one, 
and  calls  BELCH  to  transmit  the  message.  BELCH  transmits  one  line  to 
the  remote  terminal;  GULP  reads  a  line  from  the  terminal. 

When  documents  are  being  printed  at  the  remote  terminal, 
SELECT  will  not  be  used.  DIALOGUE  will  obtain  the  data  to  be  sent  by 
calling  FETCH,  and  then  call  BELCH  to  transmit.  Data  obtained  from  other 
program  modules,  such  as  BUILD,  will  also  be  transmitted  without  a  call 
to  SELECT. 

II.  3.2.  3  Examples  of  System  Operation.  Figure  II- 3  illustrates  the 
roles  played  by  the  various  program  modules  by  showing  the  sequence  of 
events  that  might  take  place  during  the  processing  of  a  query.  This  example 
shows  only  the  gross  features  of  query  processing  and  document  document 
cori elation;  a  sophisticated  user  would  cause  a  much  more  complex  process 
to  occur. 


During  operation  of  the  system,  DIALOGUE  performs  a  function 
in  addition  to  those  shown  explicitly  in  the  flowchart;  it  directs  the  loading  of 
the  other  program  modules. 

The  user  begins  the  sequence  by  entering  a  query  which  is  read 
by  CHAT,  BUILD  is  then  loaded  and  performs  stem  and  concept  analysis, 
producing  a  concept  vector  if  the  query  contains  any  words  that  generate 
content  stems.  DIALOGUE  stores  this  concept  vector  as  the  query  vector, 
and  loads  CHOOSE  and  FETCH  together.  By  calling  FETCH  and  CORRELATE, 
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REMOVE  OLD  QUERY 
VECTOR  FROM 
lUERY  SEQUENCE  FILE 


Figure  II - 3 

Example  of  System  Operation 


CHOOSE  determines  the  accession  numbers  of  the  documents  whose  concept 
vectors  correlate  most  highly  with  the  query  vector.  This  list  is  passed  to 
DIALOGUE. 


When  DIALOGUE  has  received  the  query  results,  it  loads  CHAT 
to  transmit  the  results  to  the  user.  At  this  point,  the  user  might  elect  to 
enter  a  new  query,  in  which  case  DIALOGUE  clears  QUERY  SEQUENCE 
STATUS,  or  he  might  elect  document-document  correlation.  He  also  has 
several  other  options  which  are  not  shown  in  this  example. 

Document -document  correlation  is  performed  by  using  FETCH 
to  obtain  the  concept  vectors  that  are  to  be  used  as  query  vectors,  and 
then  calling  CHOOSE  in  the  same  fashion  as  when  processing  a  user-generated 
plain  text  query. 

Figure  II-4  shows  the  sequence  of  events  that  might  transpire 
during  the  processing  of  a  simple  query.  This  figure  emphasizes  the  System 
and  user  actions,  whereas  Figure  II-3  identifies  the  specific  program  mod¬ 
ules  that  perform  each  action. 

The  sequence  begins  when  the  user  enters  his  initial  query.  *  The 
System  first  identifies  and  removes  any  common  words  from  the  query, 
without  comment.  The  System  then  performs  stem  analysis  on  the  remain¬ 
ing  words,  mapping  each  word  into  a  sort  of  "canonical  form"  for  its 
morpheme, 


Each  stem  is  then  looked  up  in  the  stem-concept  dictionary,  and 
the  concept  codes  and  weights  thus  obtained  are  added  to  form  the  query 
concept  vector.  The  stems  from  the  query  that  were  not  found  in  the  dictio¬ 
nary  are  printed,  so  that  the  user  can  decide  whether  he  wishes  to  perform 
retrieval  with  his  query  as  it  stands,  or  add  more  words. 


t  An  experienced  user  might  start  differently. 
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•Blanks  or  commas 
delimit 


User  Types  in  Query  Words: 


r — ? - r- 

aardvark,  microfiche  ,  information  science  •  end" 


(NOTE- -"Present  Query  Table"  now  contains  the 
word,  stems  and  their  concept  vectors.  One  of 
the  mode 8  permits  printing  that  table  at  this  point)* 


- i _ 

System  forms  Sum  of  each  stems'  concept  vectors, 
forms  Query  Vector 

l 

(NOTE- -Query  Vector  can  be  printed  under  one 
of  the  modes  at  this  point)*  . 


NEXT  PAGE 


*  "Normal  Mode"  Suppresses  this--average  user  probably  does  not 
want  it. 


Figure  II-4  SIMPLE  QUER  Y  (cont'd) 
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Once  the  user  decides  to  retrieve  with  his  query,  ihe  System 
correlates  the  query  vector  with  the  concept  vector  for  each  document  in 
the  collection.  The  System  pla.ces  a  ranked  list  of  the  highest-correlated 
documents  in  the  collection  in  a  temporary  file,  for  access  by  the  user. 

The  System  then  informs  the  user  that  the  retrieval  is  complete. 
He  may  then  print  the  contents  of  the  temporary  file,  which  will  tell  him 
not  only  which  documents  were  retrieved,  but  also  their  correlations  with 
the  query  concept  vector  and  their  rank  by  correlation.  Also,  a  temporary 
identification  number  is  assigned  to  each  document,  so  that  the  user  may 
refer  to  documents  without  typing  in  a  lengthy  accession  number. 

Once  the  temporary  file  of  results  from  a  retrieval  has  been 
formed,  the  user  has  a  number  of  ways  in  which  he  can  use  the  results. 

He  can  print  bibliographic  data  for  all,  some,  or  one  of  the  retrieved  docu¬ 
ments;  he  can  print  any  of  the  documents  t  lemselves,  or  he  can  use  the 
retrieved  documents  to  find  other  fimilar  documents  ir  the  collection. 


II.  4  CURRENT  STATUS  OF  THE  ON-LINE  RETRIEVAL  SYSTEM 

The  System  design,  including  the  design  of  the  off-line  indexing 
programs,  is  complete  and  is  presented  in  detail  in  the  Interim  Report  (5). 
Although  the  present  System  design  does  not  exhaust  the  potential  capabilities 
of  automatic  indexing,  and  further  study  can  still  produce  significant  results, 
nevertheless  the  implementation  of  the  full  On-Line  System  is  clearly  fea¬ 
sible  for  a  document  collection  containing  over  100  million  characters  of 
text. 


The  heart  of  the  On-Line  System,  the  dialogue  processor,  has 
been  programmed,  in  GECOS  III  Time-Sharing  FORTRAN,  and  has  been 
provided  with  additional  routines  to  perform  other  System  functions  in  order 
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to  permit  experimentation  with  the  user-System  interface.  The  dialogue 
processor  is  thus  completely  operational,  and  successful  query  sequences 
have  been  accomplished,  using  a  test  collection  of  50  documents  provided  by 
RADC. 


In  its  present  configuration,  all  features  of  the  user-System  inter¬ 
face  are  present.  Therefore,  to  the  user,  the  presently  operating  portion  of 
the  System  appears  to  be  the  entire  System.  Several  detail  refinements  of  the 
user-System  interface  have  been  made  as  a  result  of  experiments  conducted 
with  the  dialogue  processor.  This  prototype  will  still  be  useful  as  an  easily- 
modified  test  bed  even  when  the  entire  System  has  been  implemented. 
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SECTION  in 


MAN -MACHINE  DIALOGUE 

The  heart  of  the  On-Line  System  is  the  software  module  which 
governs  communications  between  the  user  and  the  System.  This  module- - 
the  dialogue  processor- -also  performs  the  executive  function  of  the  On-Line 
System.  It  calls  on  the  routines  which  perform  stem  analysis,  retrieval, 
ranking,  dictionary  lookup  and  all  the  other  functions.  It  solicits  queries 
and  commands  from  the  remote  user,  causes  search  queries  to  be  executed, 
and  reports  and  stores  the  results  and  generally  leads  the  user  through  the 
array  of  tools  available  to  him  in  his  searching  of  the  data  base.  The  dia¬ 
logue  processor  is,  therefore,  a  communications  package,  a  training  aid, 
a  file  building  program  and  an  executive  program  all  in  one. 

The  dialogue  processor  has  been  programmed,  and  is  fully  opera¬ 
tional.  Although  the  other  program  modules  that  are  called  by  the  dialogue 
processor  (see  section  II.  3.  2)  have  not  been  programmed,  the  dialogue  pro¬ 
cessor  has  been  provided  with  supporting  routines  that  simulate  the  operation 
of  these  modules.  In  this  manner,  to  the  user,  the  entire  On-Line  System 
appears  to  be  implemented.  Thus,  the  two  sample  dialogues  included  in  this 
section  are  essentially  identical  to  dialogues  that  will  be  conducted  with  the 
full  On-Line  System. 

This  section  presents  a  functional  description  of  the  dialogue  pro¬ 
cessor  in  subsection  III.  1.  The  discussion  is  then  illustrated  by  two  actual 
retrieval  dialogues,  as  might  be  conducted  by  two  users  of  differing  exper¬ 
ience  levels.  These  two  examples  show  how  the  novice  and  the  experienced 
user  might  use  the  same  query  to  obtain  information  with  the  System. 

Crucial  to  the  operation  of  the  dialogue  processor  are  the  various 
files  that  it  accesses;  in  subsection  III.  2,  the  format  and  method  of  construc¬ 
tion  of  each  file  is  discussed. 

Section  III.  3  contains  a  flowchart  of  The  Dialogue  Processor,  in 
Figure  HI-13. 
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III.  1 


FUNCTIONAL  DESCRIPTION 


The  dialogue  processor  is  designed,  insofar  as  its  functional 
characteristics  appear  to  the  user,  with  the  overriding  concept  that  different 
users  of  differing  ability,  needs,  familarity  and  goals  will  at  various  times 
attempt  to  use  the  System.  In  order  for  these  attempts  to  succeed,  the  Sys¬ 
tem  must  be  geared  to  the  user.  The  experienced  user  will  not  tolerate  the 
delays  incurred  as  lengthy  tutorial  messages  are  printed  at  the  Teletype 
terminal;  the  inexperienced  user  will  flounder  without  them.  The  inexper¬ 
ienced  user  wants  to  be  led  through  the  operation  of  the  System;  he  does  not, 
however,  wish  to  be  asked  questions  about  optional  employment  of  System 
functions  with  which  he  is  not  familiar.  On  the  other  hand,  the  experienced 
user  wants  to  be  able  to  marshal  every  last  resource  of  the  System.  Finally, 
the  inexperienced  user  should  not  be  kept  in  a  cocoon  forever,  and  he  must 
be  at  least  given  the  opportunity  to  obtain  an  explanation  of  the  various  avail¬ 
able  features  of  the  System. 

III.  1. 1  The  Query  Sequence 

The  fundamental  method  of  operation  is  embodied  in  the  concept 
of  a  query  sequence.  Initially,  the  user  sets  up  a  retrieval  command  based 
on  words.  He  is  then  given  the  opportunity  to  inspect  the  results  of  the 
retrieval,  to  modify  the  query  or  to  discontinue  the  query  sequence.  During 
such  a  query  sequence,  a  file  of  retrieved  documents  is  built  up.  The  three 
basic  options  available  to  the  inexperienced  user  are: 

"END"  Terminate  this  search  query  sequence  in  order 
to  start  a  new  sequence  or  sign  off. 

"MOD"  Modify  or  replace  the  present  query  and  continue 
the  present  query  sequence. 

"DOC"  Print  data  for  documents  retrieved  during  this 
sequence,  or  any  documents  of  known  accession 
number.  The  user  is  given  a  choice  of  the  data 
to  be  printed. 
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"HELP"  Not  truly  an  option,  this  aids  the  user  in  the 
selection  of  the  appropriate  option  name. 

Ten  other  options  exist,  and  those  indicated  by  are  actually 
entered  automatically  for  the  inexperienced  user. 


"OFF" 

"CHG" 

"CON" 
* "RET" 
* "DEL" 

*  "SEE" 
* "CLR" 

*  "WRD" 
* "DDC" 
* "WGT" 


Generates  file  for  printing  bibliographic  data  and 
documents  off-line. 

Changes  the  mode  of  operation  (sequence 
termination  not  required). 

Inspects  the  concept  vectors  of  documents. 

Executes  the  present  retrieval  request. 

Deletes  unwanted  documents  retrieved  during 
the  present  query  sequence. 

Inspects  the  existing  query. 

Erases  the  existing  query. 

Adds  or  deletes  query  words. 

Performs  document-document  correlation. 

Performs  direct  manipulation  of  query  concept 
vectors. 


In  addition  to  the  options  selected  during  a  query  sequence,  a 
user  may  set  various  modes.  The  inexperienced  user  will  take  the  default 
specification  in  which  all  modes  are  deselected,  while  the  more  experienced 
user  may  select  one  or  more  of  the  following: 


1.  Select  terse  dialogue. 

2.  Skip  formation  of  initial  query  from  words  in  query 
sequence. 

3.  Make  available  statistical  analysis  of  query. 

4.  Make  available  statistical  analysis  of  retrieval. 

5.  Assume  sophisticated  user. 
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ther  than  verbose 


Selection  of  the  first  mode  results  in  terse, 
messages  being  addressed  from  the  System  to  the  user.  Most  messages 
exist  in  two  forms,  and  the  terse  form  is  used  by  experienced  users.  The 
second  mode  skips  initial  query  formation,  and  lets  the  user  select  an  option 
name  immediately  after  signing  onto  the  System. 

If  the  third  mode  is  selected,  the  user  is  asked  if  he  wishes  to 
see  the  words,  stems  and  concept-weight  pairs  forming  a  word-based  query. 
These  da£a  are  available  for  printing  before  the  query  is  executed.  Also,  he 
can  elect  the  printing  of  the  query  concept  vector  itself.  Although  these 
options  also  exist  under  MOD,  mode  3  makes  them  available  for  analysis  of 
the  initial  query.  Note  that  this  mode  does  not  cause  the  data  to  be  printed, 
but  simply  gives  the  user  the  option  of  printing  them. 

Similarly,  mode  4  is  provided  so  that  the  user  may  be  asked 
if  he  wants  the  contents  of  the  temporary  file  (accession  number,  temporary 
identification  number,  rank  and  correlation  when  last  retrieved,  print  sup¬ 
pression  and  whether  or  not  the  last  executed  retrieval  retrieved  the  docu¬ 
ment)  immediately  after  each  retrieval.  These  data  are  otherwise  available, 
but  mode  4,  like  mode  3,  provides  a  convenience  for  the  serious  student  of 
the  System. 


Mode  5  simply  causes  "HELP"  to  result  in  the  printing  of  the 
descriptions  of  all  options,  not  just  the  basic  three. 

While  building  the  temporary  file,  the  user  can  delete  irrelevant 
documents.  Since  the  file  is  built  up  by  the  process  of  executing  different 
retrieval  requests,  the  re-retrieval  of  documents  already  retrieved  once 
during  the  sequence  may  be  inhibited  ac  the  user's  choice. 

If  bibliographic  data  for  a  document  have  been  printed  once  during 
a  single  query  sequence,  it  is  unlikely  that  the  user  will  want  these  data 
printed  again.  Such  printing  is  inhibited,  but  the  user  (even  the  inexperienced 
user)  can  override  this  inhibition. 
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If  any  words  in  the  initial  query  are  neither  common  nor  lound 
in  the  dictionary,  and  therefore  do  not  enter  into  the  retrieval  process,  they 
are  listed  for  the  user's  information.  If  either  none  of  the  query  words  are 
in  the  dictionary  or  the  query  results  in  the  retrieval  of  no  documents,  the 
user  is  so  informed  and  asked  to  enter  another  query. 

When  a  successful  retrieval  takes  place,  the  user  is  told  how 
many  documents  were  retrieved.  The  accession  numbers  of  the  documents 
are  placed  in  the  temporary  file.  The  System  then,  if  the  user  eo  desires, 
starts  to  print  more  detailed  information  about  the  documents  in  the  temp¬ 
orary  file.  For  each  document,  the  accession  number  and  temporary  identi¬ 
fication  number  are  printed.  Then  a  check  is  made  to  see  if  bibliographic 
data  for  the  document  have  been  printed  previously  during  the  query  sequence- 
if  not,  the  bibliographic  data  are  printed.  In  the  former  case  the  output  for 
a  document  occupies  only  a  single  line. 

Clearly,  users  will  infrequently  want  such  data  printed  for  the 
entire  set  of  documents  in  the  temporary  file.  On  the  other  hand,  in  order 
to  modify  his  query  intelligently,  the  user  must  have  some  idea  of  what  he 
has  retrieved.  After  the  data  for  five  documents  have  been  printed,  the  user 
is  asked  if  more  documents  are  wanted.  If  they  are,  five  more  are  printed. 

When  either  all  the  data  for  the  documents  in  the  temporary  file 
have  been  printed  or  the  user  has  decided  he  has  seen  enough,  he  is  asked 
to  enter  an  option  name  or,  in  order  to  get  a  brief  explanation  of  the  options, 
"HELP".  A  cry  of  "HELP"  from  the  user  results  in  the  printing  of  descrip¬ 
tions  of  MOD,  DOC  and  END  options.  Now,  since  it  is  not  the  desire  to  keep 
the  inexperienced  user  from  learning  more  about  the  System,  he  is  asked  if 
he  wishes  to  see  similar  explanations  of  the  remaining  ten  options,  and  if  he 
does  these  are  printed.  (Similarly,  if  he  attempts  to  use  the  option  CHG,  he 
i3  asked  if  he  wishes  to  see  a  list  of  the  modes  available.  ) 
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The  user  is  again  asked  to  enter  an  option  name.  Although  any 
legitimate  option  name  will  be  accepted,  the  basic  three  will  be  used  most 
frequently.  An  illegal  option  name  will  result  in  an  error  message  and  a 
request  for  an  option  name  or  "HELP1',  so  that  a  user  who  misremembers  a 
name  is  taken  back  to  the  point  where  aid  is  available. 

The  END  option  causes  the  user  to  be  asked  if  he  is  through  with 
the  retrieval  System.  If  he  is,  the  System  is  shut  down;  if  not,  an  entire 
new  query  sequence  is  initiated. 

The  DOC  option  allows  the  user  to  obtain  more  information  about 
the  documents  presently  in  the  temporary  file,  or  any  other  documents  for 
which  the  accession  number  is  known.  The  user  is  first  as  ked  if  he  wanes 
only  bibliographic  information  for  documents  in  the  present  temporary  file, 
with  information  previously  printed  suppressed- -just  as  results  after  an 
initial  query.  If  he  answers  "YES",  these  data  and  the  temporary  file  data 
are  made  available,  with  the  question  "MORE"  following  every  five  docu¬ 
ments  in  the  bibliographic  section.  It  is  expected  that  this  would  be  done  by 
a  user  who  printed  only  a  small  part  of  the  bibliographic  date  immediately 
following  a  retrieval  and  then  wants  to  obtain  more  of  it. 

If  the  last-mentioned  question  is  answered  "NO",  the  user  is 
asked  to  specify  a  document  or  document  set  of  interest  to  him.  He  may  do 
so  by  entering  a  single  accession  number  or  temporary  identification  num¬ 
ber,  or  a  range  of  temporary  identification  numbers,  or  the  word  "ALL" 
to  signify  all  the  documents  in  the  temporary  file.  An  illegal  entry  results 
in  a  more  detailed  explanation  of  the  format  required  and  a  request  that  the 
user  try  again. 

For  each  document  specified,  the  accession  number  is  first 
printed.  If  the  document  is  in  the  temporary  file,  the  following  are  printed: 
its  temporary  identification  number,  rank  and  correlation  on  its  last 
retrieval,  whether  or  not  the  last  executed  retrieval  retrieved  the  document, 
and  wehther  or  not  the  bibliographic  data  for  the  document  have  already  been 
printed. 
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If  the  document  is  suppressed  from  future  retrieval,  this  fact 
is  stated.  Bibliographic  data  are  printed  if  they  have  not  been  printed  before; 
if  they  have,  the  operator  is  asked  if  they  are  to  be  printed  again  and  appro¬ 
priate  action  is  taken.  Next  the  operator  is  asked  if  the  abstract  is  to  be 
printed,  and  the  System  prints  it  in  response  to  an  answer  of  "YES". 

If  there  was  only  one  document  specified  by  accession  number 
or  temporary  identification,  the  user  is  given  the  opportunity  to  specify 
more.  The  process  continues  as  above  if  he  does,  or  requests  an  option 
name  if  he  does  not. 

Printing  an  entire  abstract  may  take  some  time,  so  even  if  a 
set  of  documents  has  been  specified  the  user  is  asked  if  he  wishes  to  con¬ 
tinue  after  the  printing  of  an  abstract.  Similarly,  the  user  is  asked  if  he 
wishes  to  continue  after  the  printing  of  any  information  from  five  documents. 

A  negative  reply  in  either  case  results  in  a  request  for  an  option  name,  or 
the  specification  of  other  documents  to  be  examined. 

The  MOD  option  not  only  allows  the  iser  to  modify  or  replace  his 
query,  but  it  also  automatically  transfers  the  inexperienced  user  to  sections 
of  other  options  in  order  to  delete*  entries  from  the  temporary  file  (if 
desired  or  required)  and  perform  retrieval**.  Upon  entrance  to  MOD,  the 
user  is  first  asked  if  document -document  correlation  is  to  be  used  as  the 
retrieval  method.  (Recall  that  he  has  started  with  query  words  and  already 
retrieved  some  documents.  ) 

If  both  document-document  correlation  is  chosen  and  the  last 
retrieval  performed  was  also  based  on  document-document  correlation,  the 
user  is  given  the  option  of  building  on  the  concept  vector  used  in  the  previous 
retrieval  or  starting  afresh.  He  then  ouilds  or  adds  to  a  query  vector  by 
specifying  any  number  of  documents  by  means  of  single  accession  number, 


*  DEL 
**  RET 


1-1 


single  or  ranges  of  temporary  identification  numbers,  or  all  the  documents 
in  the  tempotary  file.  After  indicating  that  no  more  documents  are  to  be 
used  for  the  search,  the  user  is  asked  if  he  desires  to  initiate  the  retrieval. 

The  point  at  which  the  user  is  asked  about  starting  the  retrieval 
can  be  reached  by  another  path,  which  is  started  when  the  user  rejects 
document-document  correlation.  The  words  forming  the  last  query  per¬ 
formed  on  a  query  word  basis  have  been  retained  (with  their  stems  and  con¬ 
cept-weight  mappings),  so  the  user  is  given  the  choice  of  retaining  and  build¬ 
ing  on  them  or  erasing  them  and  building  a  new  set  of  query  words.  The 
System  is  so  designed  that  a  user  can  inspect,  modify  and  again  inspect  the 
set  of  query  words,  and  so  the  user  is  asked  if  he  wishes  to  inspect  or  modify 
the  set  or  not.  A  negative  answer  causes  the  user  to  be  asked  if  be  wishes  to 
initiate  retrieval. 

If  the  user  indicates  that  he  does  wish  to  inspect  or  modify  the 
set  of  query  vector  words,  the  present  set  (with  stems  and  concept-weight 
pairs)  is  printed  and  he  is  then  asked  if  he  wants  to  add  or  replicate  any 
words.  If  he  does,  he  is  asked  to  enter  the  words.  Any  noncommon,  non- 
dictionary  words  are  reported  to  the  user  if  they  are  entered,  and  he  is 
again  given  the  chance  to  add  or  replicate  words.  The  user  is  then  given 
the  opportunity  to  delete  words,  and  informed  if  he  attempts  to  delete  any 
words  not  present  and  allowed  to  try  again. 

Next  the  user  is  given  the  opportunity  to  inspect  the  query  con¬ 
cept  vector  directly,  and  if  he  so  elects  it  is  printed.  He  may  add  signed 
concept  number-weight  pairs,  and  is  informed  of  any  illegal  concept  num¬ 
bers  that  he  attempts  to  enter. 

Use  of  the  above  three  methods  of  query  vector  modification,  or 
some  combination  of  them,  eventually  leads  the  user  to  the  point  where  he  is 
asked  if  he  wants  a  retrieval  performed.  It  is  possible  that  he  wants  to 
return  to  the  point  of  entering  an  option  name- -for  example,  he  might  want 


to  have  some  additional  document  information  printed,  and  then  return  to 
building  a  document-document  correlation  query.  In  such  an  event,  he  would 
answer  the  question  about  initiating  retrieval  in  the  negative. 

When  the  user  indicates  that  he  does  want  to  perform  a  retrieval, 
the  dialogue  processor  determines  if  the  query  concept  vector  is  null.  If  it 
is,  the  user  has  obviously  become  confused,  and  he  is  given  the  opportunity 
of  either  starting  a  new  query  sequence  or  resuming  the  present  sequence 
with  a  new  option  name. 

Assuming  that  a  retrieval  is  requested  and  the  query  vector  is 
not  null,  the  user  is  informed  if  the  temporary  file  is  empty.  Me  is  asked 
to  specify  if  documents  previously  retrieved  during  the  query  sequences  are 
to  be  excluded  from  re-retrieval  or  not,  and  he  is  asked  if  printing  of  biblio¬ 
graphic  data  already  printed  once  should  be  allowed  or  suppressed. 

If  the  temporary  file  is  lull,  the  user  is  told  that  he  must  make 
space  for  the  documents  to  be  retrieved;  if  it  is  partially  filled  he  is  given 
the  opportunity  to  delete  documents.  Documents  to  be  deleted  are  specified 
by  accession  number,  temporary  identification  number  or  range  of  temp¬ 
orary  identification  numbers.  Alternatively,  the  entire  temporary  file  may 
be  deleted. 


Then,  in  order  that  the  user  may  identify  contents  of  the  temp¬ 
orary  file  with  the  particular  queries  retrieving  them,  he  is  informed  of 
the  starting  temporary  identification  of  the  documents  to  be  retrieved,  and 
the  retrieval  is  performed. 

If  no  documents  are  retrieved,  the  user  is  so  informed  and 
asked  to  enter  an  option  name  or  "HELP".  If  the  retrieval  is  successful, 
the  system  continues  just  as  it  does  after  a  successful  initial  retrieval. 
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m.  1.2 


The  Temporary  File 


Every  time  a  retrieval  is  successfully  executed  during  a  query 
sequence,  information  concerning  the  documents  retrieved  is  added  to  the 
temporary  file,  continuing  until  all  retrieved  documents  have  been  placed 
in  the  file  or  until  the  file  is  full.  The  file  capacity  is  50  documents,  but  it 
may  contain  results  of  previous  retrievals.  Before  a  retrieval  is  executed, 
the  user  is  informed  that  the  file  is  presently  empty,  or  informed  of  the 
remaining  space  anc!  asked  if  additional  space  is  required,  or  told  that  the 
file  is  full  and  that  additional  space  must  be  created.  If  he  retrieves  more 
documents  than  there  are  spaces  in  the  file,  only  the  highest  correlated 
documents  are  placed  in  the  file. 

During  any  query  sequence,  each  retrieved  document  is  assigned 
a  temporary  identification  number.  This  number  is  used  only  for  conven¬ 
ience,  since  it  is  potentially  much  shorter  than  the  document's  accession 
number.  The  user  may  need  to  specify  a  document  for  deletion  from  the 
temporary  file,  for  the  printing  of  bibliographic  data  or  of  the  document 
itself,  or  for  document -document  correlation. 

The  temporary  file  contains  only  the  following  information: 

1.  Accession  number; 

2.  Temporary  identification  number; 

3.  Flag  indicating  if  the  last  executed  retrieval  retrieved 

the  document; 

4.  Flag  indicating  if  the  bibliographic  data  for  the  docu¬ 
ment  have  been  printed  and  the  printing  inhibition  not 
removed; 

5.  Correlation  obtained  during  the  last  retrieval  of  the 
document; 

6.  Rank  obtained  during  the  last  retrieval  of  the  document. 

Subsection  III.  2.  7  discusses  the  manner  in  which  the  temporary 
file  is  stored  in  detail. 
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In  addition  to  the  temporary  file,  there  is  a  list  of  documents 
whose  retrieval  is  excluded.  These  are  documents  which  have  been  retrieved 
at  least  once  during  the  retrieval  process,  that  the  user  does  not  want  to  re¬ 
retrieve. 


III.  1.  3  Query  Types 

Initially,  a  set  of  query  words  is  entered  by  the  user.  *  A  file 
containing  these  words,  their  stems  and  weighted  mapping  into  concepts  is 
established.  For  additional  retrievals  during  the  query  sequence,  the  file 
may  be  cleared*and  a  new  query  entered.  Or  words  may  be  deleted,  added 
or  replicated,  building  on  the  initial  query. 

After  a  retrieval,  the  query  concept  vector  is  retained.  If  the 
next  retrieval  is  based  on  query  words,  the  query  concept  vector  is  simply 
cleared  and  a  new  vector  constructed  from  the  query  word  file.  The  query 
word  file  itself  may  be  entirely  new  or  formed  by  adding  and  deleting  words 
from  the  previous  query  word  file.  In  the  case  of  document-document  corre¬ 
lation,  the  user  may  either  build  on  the  existing  query  concept  vector  or 
generate  an  entirely  new  one. 

It  is  also  possible  for  the  user  to  manipulate  the  query  concept 
vector  directly. 

III.  1.  4  Levels  of  Document  Information 


Information  concerning  documents  is  available  on  three  levels. 
First  is  the  temporary  file  information,  obviously  available  only  for  docu¬ 
ments  retrieved  during  the  present  query  sequence.  The  only  permanent 
information  in  the  file  is  the  document's  accession  number. 


*  The  experienced  user  may  skip  this  activity. 
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Ther?  are  also  the  bibliographic  data,  with  such  elements  as 
author,  title,  date,  etc.  These  data  may  be  printed  in  a  relatively  short 
time,  and  the  user  may  obtain  them  for  either  documents  in  the  temporary 
file  or  for  any  other  document  whose  accession  number  is  known. 

Finally,  there  are  the  documents  themselves.  These  can  be 
obtained  in  the  same  manner  as  the  bibliographic  data,  and,  of  course,  are 
comparatively  lengthy.  (In  the  presently  contemplated  data  base,  the 
"documents"  are  in  fact  abstracts  of  other  documents. ) 

III.  1.  5  Two  Retrieval  Dialogues 

Figure  III- 1  contains  two  actual  dialogues  with  the  dialogue  pro¬ 
cessor,  as  might  result  from  use  of  the  System  by  users  of  two  different 
experience  levels.  I'or  purposes  of  illustration,  both  users  begin  with  the 
same  query,  and  perform  similar  actions.  The  inexperienced  user,  whose 
query  appears  in  Figure  III-l(a]t  is  guided  extensively  by  the  dialogue  pro¬ 
cessor,  and  is  not  offered  the  disolay  of  various  internal  information  that 
would  only  confuse  him.  On  the  other  hand,  the  experienced  user  is  permit¬ 
ted  to  display  data  that  give  great  insight  into  the  workings  of  the  System. 
Thus,  Figure  IH-l(a)  emphasizes  the  tutorial  operation  of  the  system,  while 
Figure  III-l(b)  shows  in  some  detail  the  operation  of  the  System. 

HI.  1.5.1  Dialogue  with  Inexperienced  User.  This  discussion  refers  to  the 
dialogue  of  Figure  III-l(a).  This  user  knows  what  the  System  does,  but  has 
neither  the  need  nor  inclination  to  find  out  how  the  System  works.  He  knows, 
for  instance,  that  he  should  start  a  query  by  answering  "yes"  to  the  question, 
"Is  normal  operation  desired?".  This  gives  him  the  verbose  form  of  all 
messages,  and  the  simplest  sequence  of  questions. 

His  initial  query  concerns  information  processing  and  informa¬ 
tion  engineering.  Three  of  the  words  in  his  query,  "representation", 
"boradeat",  and  "sense"  are  not  in  the  stem  dictionary,  and  are  therefore 
not  useful  for  retrieval  from  the  collection.  One  of  these,  "boradest",  is  a 
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Figure  IH-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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Figure  III-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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Figure  III-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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Figure  III-l{a)  Dialogue  with  Inexperienced  User  (cont'd) 


ru 


13 


\ 

Q 

H 

td 

X 

id 

o 

H 

td 

X 

td 

Z 

3 

s 

> 

M 

8 

£ 

Id 

X 

X 

CO 

td  t*4 

8 

ta 

H 

<  X  X 

ta 

CO 

H 

H 

H 

H 

H 

H 

H 

H 

t-t 

1  H  H 

Id 

<3 

4 

4 

S 

s 

V 

X 

X 

X 

X 

X 

Id  Id 

~  Z 

H 

J 

ta 

ta 

ta 

ta 

ta 

ta 

ta 

ta 

• 

£K  IK 

a 

to  M 

4 

to 

m  •  8 

4 

H  Q 

O 

Q 

J 

ta  to  ta  Id 

O  3 

> 

4 

••s  X 

4  .J 

• 

• 

> 

y  S  2  10  H 

4 

X  C_/ 

X 

H 

Id 

X  _J  2 

> 

H  * 

Id 

m 

4  J  •  8  ta 

4 

CO  id 

X 

X 

X 

X  3  ta  M  ® 

CD 

H 

®  P  £  F 

ta 

4  « 

W 

ta  (3  4  id 

8 

td 

X 

X  X  H  O  X 

8  CJ 

X 

ft 

CO 

M 

S© 

>o 

<r 

■O 

«? 

Id  •  IS  MS 

H 

Z  2 

z 

Q 

<r 

co 

to 

Si 

X 

HQ  Id  ta  10 

fO 

v#  3 

4 

Id 

•  Id 

v  td  *  >  ‘M 

M 

X 

> 

Id  2 

Id  Q  H  IS  4  O  IK 

J 

4  <3 

Id 

J 

X  Id  Id  2  X  td  8 

H  ta 

w 

mX 
ta  Gi 

ft  '*■ 

fd 

T 

4  to 

Q 

X 

H 

ta. 

S  S  Q  u  _J  z 

P 

> 

Id 

y 

S  ta  CJ  m  X  ® 

OK 

X 

o 

O 

O 

o 

M 

•M 

M 

«*k 

K  £d 

KKUH  3  =)  m 

Id 

m  td 

z 

o 

O 

o 

o 

o 

O 

o 

o 

4  J 

ta  td  CO  8  H 

(d 

X  =  Q 

8 

H 

♦ 

♦ 

+ 

■f 

1 

1 

1 

1 

a:  m 

X  «  J  X  < 

to 

a.  a  u 

M 

to 

Id 

Id 

Id 

td 

Id 

td 

Id 

td 

IS  ta. 

Q  8  «  4  2 

H 

W 

4 

r~ 

m 

» 

® 

m 

sr 

CO 

ta 

Id  to  H  td  >  (9  X 

8 

X  to  z 

4 

J 

to 

CO 

H 

<o 

CO 

H 

tr 

<T 

£  > 

H  mi  (3  td  Z  8 

H 

(9  mm 

J 

•m 

in 

«n 

o 

U1 

M 

m 

u  x 

td  to  T  >-j  m  ta 

8  X  X 

M  t"  ta 

J 

U 

2 

cu 

mr. 

<M 

to 

H  4 

-J  J  H  «fc  X  H  Z 

0  4  Z  X  H  Id  m 

• 

# 

X 

U 

* 

• 

• 

• 

• 

• 

• 

• 

IK 

ft. 

X 

X 

o 

o 

o 

O 

o 

O 

o 

Q 

W  ® 

Q  >  td  Id  Id 

CD  (9  H 

8 

3 

*  ta 

id  x  X  X  X  O 

hZO 

O 

u 

H  ^ 

Id  M  3  Cl  M 

CD  m  4 

Id 

CO  X  O  <0  (0  X 

X  U 

2  H 

H  8  Id  m  H  ta 

Q  3  X 

m 

SUQJ  X  Z  <5 

X 

z  a  _* 

>• 

Id 

H  cc  3  H  td  X 

8 

<  3 

DC 

to  X 

u  Z.  X  ® 

V 

Q 

4 

Id  H 

UHIh  x  •  3  8 

a  td  4 

X 

• 

lO 

■v 

P) 

ou 

Q 

Os 

W 

O 

x  z  P  to  ta  w  o  m 

Id 

Z  >  H 

8 

l  * 

H 

r» 

H 

c~ 

H 

H 

\0 

'O 

4  2 

3  td  v*  O  8  J 

X 

m  fd  4 

ta 

Z 

d  H  H 

10  F  Q  Q  B 

4 

x  m  a 

Z 

3 

tfl  c* 

SO  td  O  •  Z  •  h 

Z 

Z  X 

S 

Q 

h  o 

H  X  W  Q  SltaVO 

<  H  O 

H 

M 

Ok  to  Id 

Z  ta  _1  m  8  H  8  4 

Z 

X  U  M 

wmK 

Id  UJ  ta  M  Ho 

8 

X  X 

x;  m 

r  ui  to  ta  s  x  z  c- 

M 

>-  ta 

WWW 

3  x  x  to  id  3  to 

H  m 

Z 

X  Id 

UH-U  H  t9  CO  to  3t  H 

ta  ^ 

Z  H  X 

c 

4  tO  Q 

8  (OH  ZZXm  Z 

8  <0 

8  Z  CD 

(-> 

Id 

Q  Id  3  u!  m  S  Q  U 

Z 

5  8 

CO 

X 

Id  O  U 

to  X  X  X  J  X  H  Z  H  a  Z 

X  8 

O  H  Z  m  tO 

to 

w 

® 

H 

•© 

m 

4 

CO 

K4CUU8  UJ3K  £K  >.  3 

SUM 

m  z  3  j  w 

u 

X 

* 

4 

4 

4 

■V 

<T 

Id  ta  4  y  X  ta  3  H  4  O  4  Id  Id  O 

T  H  H 

ShuS  y 

o 

£ 

TWO. 

P  to 

IS  td  B  Z  8HX38S 
itoovQiaicPoSa 

I  5  8 

X  8  M 

M  ta  Q  Ofi  It 

9 

3 

Z 

i  • 

i  : 

}  . 
i  , 


l 


in-18 


L 


Figure  III-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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Figure  III-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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RESEARCH  PERSONNEL  •  THIS  DOES  NOT  0CCUR  #  PRIMARILY  BECAUSE  THE 
STUDY  0F  OPERATING  INFORMATION  SYSTEMS  HAS  N0T  DEVELOPED  A  F0RMAL 
SET  0F  T00LS  AND  SYMB0LS  BY  WHICH  THESE  PR0CESSES  CAN  BE 
QUANTITATIVELY  DESCRIBED  •  X  &  AUTH0R  3 
M0RE?  t 
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D0  Y0U  WANT  T0  DELETE  ANY  WORDS?  t 
■  N0 

D0  Y0U  WANT  T0  ERASE  COMPLETELY  YOUR  PRESENT  QUERY  AND  ENTER 
NEW  QUERY  WORDS?* 
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Figure  III-l(a)  Dialogue  with  Inexperienced  User  (cont'd) 
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Figure  III-l(a)  Dialogue  with  Inexperienced  User  (concluded) 


misspelling  of  "broadest",  but  for  simplicity,  he  is  not  asked  if  he  wants  to 
change  his  query;  an  experienced  user  would  be  able  to  correct  his  spelling 
at  this  point,  by  modifying  his  query. 

The  processor  performs  a  retrieval,  placing  31  documents  in 
the  temporary  file.  The  user  indicates  that  he  would  like  bibliographic  infor¬ 
mation  for  some  documents;  he  is  given  this  for  five  documents.  He  then 
decides,  on  the  basis  of  this  data,  that  he  would  like  to  see  the  entire  rank¬ 
ing  table  of  documents  retrieved.  He  enters  option  "DOC"  to  print  this 
query.  To  print  selected  documents  from  the  table,  he  again  enters  option 
"DOC".  He  bypasses  printing  of  the  table  to  print  selectively.  The  title, 
"Will  there  be  a  profession  of  information  engineering?  "  appears  to  be  very 
similar  to  his  query,  so  he  prints  the  abstract.  Satisfied  that  this  document 
is  very  nearly  what  he  wanted,  he  decides  to  find  others  like  it.  But  he  has 
used  only  option  "DOC",  and  does  not  know  how  to  perform  document-docu¬ 
ment  searching.  His  cry  of  "HELP"  produces  a  list  of  the  options  he  can 
select  at  this  point.  Seeing  that  he  can  use  "MOD",  he  suppresses  printing 
of  further  options. 

The  highest-ranked  document  alone  is  used  for  document-docu¬ 
ment  searching,  without  modifying  the  query  vector.  The  user  does  not 
exclude  from  re-retrieval  documents  retrieved  during  this  sequence;  he 
wishes  to  observe  changes  in  the  ranking,  and  use  this  information  to  guide 
his  browsing.  But  he  will  not  require  re-printing  of  bibliographic  data  that 
he  has  in  front  of  him,  so  he  suppresses  it.  Since  he  has  only  one  document 
he  knows  to  be  of  interest,  he  deletes  all  entries  from  the  temporary  file. 
Following  the  retrieval,  he  prints  the  ranking  table,  entering  option  "DOC" 
to  do  so. 


From  the  table,  note  that  document  A13  (accession  number  13) 
correlates  with  itself  with  a  correlation  of  1.  0,  as  would  be  expected. 
Document  All  has  remained  second-ranked,  while  A15  has  risen  from  fifth 
to  third.  The  user  decides  to  print  bibliographic  data  for  these.  After 
obtaining  five  sets  of  bibliographic  data,  he  enters  option  "DOC"  to  print 
selected  data.  He  finds  both  All  and  A15  of  interest,  and  decides  to  modify 
his  query,  adding  material  from  these  documents. 
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After  entering  option  "MOD"  to  change  his  query  vector,  the 
user  incorrectly  answers  "SEE"  to  indicate  that  he  wants  to  see  his  query \ 
words.  The  System  corrects  him,  and  asks  him  to  answer  again.  Because 
no  words  were  entered  for  this  query  (because  document-document  searching 
was  performed)  there  are  presently  no  words  in  the  query.  An  experienced 
user  would  also  be  able  to  list  the  concept  numbers  and  weights  in  his  query. 
He  adds  to  his  query  words  dealing  with  information  science.  He  then  per¬ 
forms  another  retrieval.  Once  again,  because  of  the  small  size  of  the  collec¬ 
tion,  he  clears  the  temporary  table  before  retrieving. 


He  once  again  enters  option  "DOC"  to  print  the  ranking  table. 

In  this  table,  A 13  is  no  longer  top-ranked;  the  changes  have  made  the  query 
more  like  All  than  A13.  Documents  A15  and  A13,  which  are  now  second  and 
third,  have  already  been  printed;  so  the  user  decides  to  inspect  A14,  which 
in  three  retrievals  has  ranked  seventh,  eighth,  and  fourth.  The  bibliographic 
data  confirms  his  interest,  and  he  prints  the  abstract. 


The  user  now  has  found  four  apparently  relevant  documents.  At 
this  point,  he  would  probably  look  at  the  actual  documents,  to  make  a  final 
relevance  judgment.  Then,  if  he  was  not  completely  satisfied  with  these 
four,  he  might  initiate  another  query  sequence.  Thus,  in  addition  to  the 
browsing  that  takes  place  during  a  query  sequence  as  illustrated  by  this 
example,  there  could  exist  another  higher  level  of  vn  mg,  as  the  user 
converged  upon  the  desired  documents  by  successiv<  nu<.  ri  sequences, 
alternating  with  inspecting  documents. 


III.  1.  5.  2  Dialogue  with  More  Experienced  User.  Figure  LU-l(b)  shows  a 
dialogue  that  might  be  conducted  by  a  more  experienced  user,  who  has  used 
the  System  several  times  and  who  was  becoming  proficient  in  its  use.  For 
purposes  of  this  discussion,  the  initial  query  entered  by  this  user  is  the 
same  as  the  one  entered  in  the  dialogue  of  Figure  III -1(a). 
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Figure  III -1(b)  Dialogue  With  Experienced  User  (cont'd) 
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-LINE  RETRIEVAL  SYSTEM  SIGNING  0FF 


The  start  of  tne  dialog  t  illustrates  the  manner  in  which  the 
System  assists  the  user  when  he  has  trouble;  even  an  experienced  user  will 
occasionally  rely  on  this  feature.  In  this  case,  the  user  indicates  that  he 
does  not  want  normal  operation,  and  refuses  an  explanation  of  the  modes 
that  are  available  to  him.  But  when  he  is  asked  to  identify  flags  to  be  turned 
on,  he  realizes  that  he  has  forgotten  how  the  flags  are  used.  When  he  enters 
"WHAT?"  to  the  question  about  flags  to  be  set,  he  is  given  a  second  oppor¬ 
tunity  to  see  the  list  of  modes,  which  he  uses. 

The  user  elects  to  set  flags  1,  3,  and  4.  Flag  1  selects  the 
terse  form  of  System  messages  to  the  user;  flag  3  enables  the  user  to 
obtain  a  complete  analysis  of  his  query  before  performing  a  retrieval,  and 
flag  4  permits  the  user  to  print  the  ranking  table  after  a  retrieval. 

Once  the  user  has  selected  mode  flags,  he  must  choose  an  option. 
He  enters  "HELP"  to  see  the  list  of  options;  but  he  has  selected  terse  dia¬ 
logue,  and  therefore  does  not  receive  a  complete  explanation  of  their  pur¬ 
poses.  To  get  this  explanation,  he  uses  option  "CHG"  to  cancel  his  selection 
of  terse  dialogue,  gets  the  explanation  of  all  options,  restores  his  previous 
flag  selections,  and  continues. 

The  query  the  user  enters  is  identical  to  the  previous  user's 
query,  as  discussed  above,  including  the  misspelling  of  "broadest"  as 
"boradest".  Realizing  his  typing  .-rror,  the  user  corrects  his  mistake  by 
entering  the  word,  typed  correctly.  It  is  not  necessary  for  him  to  delete 
the  misspelled  version,  since  words  whose  stems  are  not  in  the  stem-concept 
dictionary  are  not  retained  in  the  query.  In  order  to  observe  the  System's 
processing  of  his  query,  the  user  prints  the  query  concept  vector  and  present 
table.  His  query  contained  ten  content-stem-producing  words.  "Information" 
appeared  three  times,  accounting  for  the  weight  of  3. 0  asaigned  to  concept 
number  421  in  the  query  vector.  "Processing"  occurred  .wice,  giving  a 
weight  of  2.  0  to  concept  number  648. 
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The  retrieval  produces  the  same  result  obtained  in  the  earlier 
dialogue,  since  the  same  query  was  used.  The  user  moves  directly  into 
option  "DOC"  to  print  the  abstract  of  A13,  the  highest -ranked  document 
retrieved.  This  document  is  very  similar  to  his  query;  therefore,  he 
decides  to  use  document-document  searching  to  find  another  document 
similar  to  A13.  It  is  expected  that  most  query  sequences  will  be  conducted 
in  this  way,  since  document-document  searching  is  a  rapid  way  to  construct 
a  complex  query  vector. 

In  the  results  of  the  document -document  search,  A13  is  of 
course  ranked  highest,  since  it  correlates  perfectly  with  itself.  The  user 
prints  bibliographic  data  for  All  and  A15,  which  are  ranked  2  and  3.  Satis¬ 
fied  that  these  documents  are  of  interest,  he  decides  to  perform  document- 
document  searching  based  on  both  of  these  documents,  and  A13.  Note  that 
the  number  of  concepts  in  the  query  concept  vector  has  grown  from  7,  for 
the  initial  query,  to  34,  for  the  first  document-document  search,  to  50,  for 
the  present  document-document  search  based  on  three  documents.  In  this 
manner,  the  user  is  building  an  increasingly  complex  query  vector,  that 
describes  more  and  more  precisely  the  concept  vector  of  the  document  he 
seeks.  In  this  case,  the  experienced  user,  by  using  the  multiple -document 
facility  of  document-document  searching,  has  developed  a  much  more 
sophisticated  concept  vector  than  the  beginning  user,  who  merely  modified 
his  previous  qvxery  by  adding  a  few  words. 

At  this  point,  the  user  decides  to  inspect  the  abstracts  of  the 
highest-ranked  documents  in  the  latest  retrieval  that  have  not  yet  been 
printed,  namely  A6  and  A  22.  At  this  point,  he  has  read  abstracts  of  five 
documents  that  are  apparently  of  interest,  so  he  signs  off  the  System. 
Presumably  he  would  obtain  hard-copy  (or  microimage)  versions  of  the 
documents.  If  these  documents  did  not  fulfill  his  need,  he  might  browse 
further  by  initiating  another  query  sequence. 
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III.  2 


FILES  ACCESSED  BY  THE  ON-LINE  SYSTEM 


This  section  discusses  in  some  detail  each  of  the  files  accessed 
by  the  On-Line  System.  Figure  III- 2  names  these  files  and  describes  their 
contents.  The  word  "file"  is  used  here  to  refer  to  a  collection  of  data  that 
is  logically  distinct,  rather  than  to  indicate  any  sort  of  programming  method. 
Thus,  some  of  these  "files"  are  actually  GECOS  III  quick-access  files,  as 
shown  in  Figure  III- 2(a),  while  others  might  be  called  "pseudo-files",  since 
they  are  stored  as  part  of  the  dialogue  processor  (Figure  III- 2(b)  ). 

HI.  2. 1  Document  File --DATA1 


The  document  file,  DATA1,  consists  of  fifty  abstracts,  with 
bibliographic  data,  furnished  by  RADC  for  purposes  of  testing  the  dialogue 
processor.  Figure  HI-3  shows  the  beginning  and  end  of  JATA1. 

HI.  2.  2  File  Words 


This  file  contains  five  types  of  information:  common  words  and 
suffixes  cf  cne,  two,  three  and  four  characters.  The  file  is  indexed  by  line 
number,  with  the  first  digit  of  the  line  number  indicating  the  type  of  infor¬ 
mation  stored  on  that  lire.  This  digit  is  0  for  common  words  and  1,  2,  3, 
or  4  for  stems  of  th  .  corresponding  length.  The  next  three  digits  are  for 
sequencing  only.  Fig  ure  HI-4  is  a  listing  of  WORDS. 

HI.  2.  3  Dictionary  File 

i 

fc 

The  stem-concept  dictionary  consists  of  a  list  of  stems  and  con¬ 
cept  vectors.  With  each  stem  is  stored  a  three-component  concept  vector. 
The  first  component  is  a  unique  concept  number  with  a  weight  of  unity,  and 
the  second  and  third  components  are  null.  This  dictionary  is  used  to  map  a 
query  or  document  into  a  concept  vector. 
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File  Name 

. 

Contents 

When  Printed 

DATA1 

For  each  document: 

•  Title 

•  Author 

•  Abstract 

Selective  printing  for 
retrieved  documents. 

WORDS 

Common  words  and 
suffixes  for  stem 
analysis. 

Not  printed. 

DICTNRY 

Dictionary--for  each 
content  stem,  the  stem 
and  its  corresponding 
concept-weight  pairs. 

Not  printed. 

CONCEPTS 

One  concept  vector  for 
each  document  in 

DATA1. 

Printed  by  option 
"CON". 

MESSAGES 

Catalog  of  user  messages 
that  car  be  transmitted 
by  the  On-Line  System. 

Printed  as  required. 

OFFLINE 

Abstracts  to  be  printed 
off-line. 

Not  printed  by  the 
Dialogue  Processor. 

(see  III.  2.  6) 

Figure  HI-2(a)  Files  Accessed  by  the 
On-Line  System:  GECOS  III  Files 


Figure  III -2(b)  Files  Accessed  by  the 
On-Line  System:  Core-Resident  Files 
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Figure  III-3  DATA!  (concluded) 


0001 

ABOUT 

0002 

ABOVE 

0003 

ACR0SS 

0004 

AFTER 

0005 

AGAINST 

0006 

ALL 

0007 

ALMOST 

0008 

ALONE 

0009 

ALONG 

0010 

ALSO 

001  1 

ALTHOUGH 

0012 

ALWAYS 

>013 

AMONG 

0014 

AM 

0015 

AND 

0016 

ANOTHER 

0017 

AN 

0018 

ANYBODY 

0019 

ANYONE 

0020 

ANY 

0021 

ANYTHING 

0022 

ANYWHERE 

0023 

APART 

0024 

ARE 

0025 

AROUND 

0026 

A 

0027 

ASIDE 

0028 

AS 

0029 

AT 

0030 

AWAY 

0031 

AWFULLY 

0032 

BECAUSE 

0033 

BEEN 

0034 

BEFORE 

0035 

BEHIND 

0036 

BEING 

0037 

BELOW 

0038 

BE 

0039 

BETWEEN 

0040 

BEYOND 

0041 

BOTH 

0042 

BUT 

0043 

BY 

0044 

CANN0T 

0045 

CAN 

0046 

COULD 

0047 

DID 

0048 

DOES 

0049  DOING 
0050  DONE 
0051  DO 
0052  DOWN 
0053  DURING 
0054  EACH 
0055  EITHER 
0056  ELSE 
0057  ELSEWHERE 
0058  ENOUGH 
0059  ETC 
0060  EVEN 
0061  EVER 
0062  EVERYONE 
0063  EVERY 
0064  EVERYTHING 
0065  EVERYWHERE 
C066  EXCEPT 
0067  FEW 
0068  FOR 
006?  FORTH 
0070  FROM 
0071  FURTHERMORE 
0072  GET 
0073  GETS 
0074  GOT 
0075  HAD 
0076  HARDLY 
0077  HAS 
0078  HAVE 
0079  HAVING 
0080  HENCE 
0081  HEREIN 
0082  HERE 
0083  HER 
0084  HERSELF 
008  5  HE 
0086  HIM 
0087  HIMSELF 
0088  HIS 
0089  HITHER 
009  0  H0WBEIT 
0091  H0WEVER 
009  2  H0W 
009  3  IF 
009  4  INASMUCH 
0095  INDEED 


0097  IN 

0098  INS0FAR 

0099  INSTEAD 

0100  INTO 

0101  INWARD 

0102  I 

0103  IS 

0104  IT 

0105  ITSELF 

0106  ITS 

0107  JUST 

0108  KEEP 

0110  LEAST 

0111  LESS 

0112  LEST 

0113  MANY 

0114  MAY 

0115  ME 

0116  MIGHT 

01  17  MINE 

0118  M0RE0VER 

01  19  M0RE 

0120  MOST 

0121  MUCK 

0122  MUST 

0123  MY 

0124  MYSELF 

0125  NEITHER 

0126  NEVERTHELESS 

0127  NEXT 

0128  N0B0DY 

0129  NONE 

0130  NOR 

0131  0R 

0132  NO 

0133  NOTHING 

0134  NOT 

3135  NOWHERE 

0136  OF 

0137  OH 

0138  ONE 

0139  ONES 

0140  ONLY 

0141  ON 

0142  ONTO 

0143  0THER 

0144  OTHERS 

0145  OTHERWISE 


009  6  INNER 

Figure  III  -  4  WORDS  Listing 

(continued) 
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0146  0UGHT 

0194  THOUGH 

1002  E 

3028  MEN 

0147  0UR 

019  5  THROUGHOUT 

1003  S 

3029  MAN 

0148  0URSELVES 

0196  THUS 

1004  Y 

3030  N0T 

0149  0URS 

0197  TOGETHER 

2001  'S 

3031  0D£ 

0150  0UTSIOE 

0198  TOO 

2002  AL 

3032  0SE 

0151  OVER 

0199  TO 

2003  AN 

3033  0US 

0152  OWN 

0200  TOWARD 

2004  AR 

3034  PLY 

0153  PER 

0201  TW0 

2005  CY 

3035  STY 

0154  PLEASE 

0202  UNDERNEATH 

2006  ED 

3036  TAL 

0155  PLUS 

0203  UNDER 

2007  ED 

3037  TER 

0156  QUITE 

0204  UNLESS 

2008  EN 

3038  TIC 

0157  RATHER 

0205  UNTIL 

2009  £R 

3039  TLE 

0158  REALLY 

0206  UNTO 

2010  ET 

3040  ULE 

0159  RIGHT 

0207  UP0N 

2011  IC 

3041  URE 

0160  SELF 

0208  UP 

2012  LY 

3042  VAR 

0161  SELVES 

0209  US 

2013  0N 

3043  WAY 

0162  SEVERAL 

0210  VERY 

2014  0R 

4001  ABLE 

0163  SHALL 

0211  WAS 

2015  0U 

4002  ANCE 

0164  SHE 

0212  WELL 

2016  RY 

4003  CANT 

0165  SHOULD 

0213  WERE 

2017  S’ 

4004  Cl DE 

0166  SINCE 

0214  W£ 

2018  TH 

4005  DUCE 

0167  SIX 

0215  WHATEVER 

3001  AGE 

4006  ENCE 

0168  SOMEBODY 

0216  WHAT 

3002  ANT 

4007  EVER 

0169  SOME 

0217  WHENCE 

3003  ARY 

4008  HAND 

0170  SOMETHING 

0218  WHENEVER 

3004  ATE 

4009  I  ENT 

0171  SOMETIMES 

0219  WHEN 

3005  BAR 

4010  ITIE 

0172  SOMEWHAT 

0220  WHERE 

3006  CAN 

4011  LENT 

0173  SO 

0221  WHEREVER 

3007  DER 

4012  LERT 

0174  STILL 

0222  WHETHER 

3008  EED 

4013  LESS 

0175  SUCH 

0223  WHICH 

3009  ENT 

4014  MATE 

0176  TEN 

0224  WHILE 

3010  EST 

4015  MENT 

0177  THAN 

022  5  WHOM 

30 1 1  ETH 

4016  MITY 

0178  THAT 

0226  WHO 

3012  FUL 

4017  NESS 

0179  THEIR 

0227  WHOSE 

3013  GEN 

4018  PEAR 

0180  THEIRS 

0228  WHY 

3014  I AL 

4019  SERT 

0181  THEM 

0229  WILL 

3015  IAN 

4020  SERT 

0182  THEMSELVES 

0230  WITHIN 

3016  I  ED 

4021  SHIP 

0183  THENCE 

0231  WITHOUT 

3017  IES 

4022  SING 

0184  THEN 

0232  WITH 

3018  ING 

4023  S0RB 

0185  THEREBY 

0233  WOULD 

3019  ISH 

4024  TEEN 

0186  THEREFORE 

0234  YES 

3020  ISM 

4025  THER 

0187  THERE 

0235  YET 

3021  I 0N 

4026  TIAL 

0188  THE 

02  3  6  YOUR 

3022  1ST 

4027  TI0N 

0189  THESE 

0237  YOURSELF 

3023  ITY 

4028  TTZE 

0190  THEY 

0238  '’OURSELVES 

3024  IVE 

4029  TURB 

019  1  THIS 

0239  YOURS 

3025  iZE 

4030  VICE 

0192  UPWARD 

0240  YOU 

3026  LAY 

4031  WISE 

0193  THOSE 

1001  • 

3027  LEL 

Figure  III-4  WORDS  Listing 
{concluded) 
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This  file  is  generated  by  program  DICGEN,  which  reads  from  the 
test  file  of  50  documents.  It  reads  a  record  and  skips  the  rest  of  the  record 
if  the  record  is  title,  author,  end  or  corporate  information.  If  the  record  is 
the  beginning  of  an  abstract,  the  abstract  is  searched  for  stems.  Words 
which  do  not  begin  with  an  alphabetic  character  are  excluded  from  consider¬ 
ation.  The  stem  analysis  routine  is  employed,  so  that  common  words  are 
rejected.  The  program  generates  a  table  of  up  to  5000  stems. 

The  program  first  calls  for  the  maximum  number  of  dictionary 
entries.  It  then  processes  abstracts  from  the  DATA1  file  until  the  specified 
number  of  unique  stems  has  been  found,  or  the  end  of  the  file  has  been 
reached,  A  PLUCK  delimiter  parameter  of  two  is  used,  so  words  containing 
hyphens  are  not  split.  It  is  required  that  the  first  character  of  any  stem  be 
alphabetic,  and  stems  not  meeting  this  requirement  are  rejected.  The  dictio¬ 
nary  is  alphabetized. 

The  program  PLUCKs  from  DATA1  until  the  beginning  of  an 
abstract  is  found.  It  then  PLUCKs  and  STEMs,  rejecting  common  words 
through  STEM  and  stems  that  do  not  start  with  a  letter  of  the  alphabet  directly. 
A  stem  that  meets  these  requirements  is  checked  against  the  dictionary,  and 
if  it  is  not  already  in  the  dictionary,  it  is  added  in  the  correct  order. 

Programs  that  generate  little  or  no  Teletype  output  can  go  dead 
owing  to  computer  failure,  like  any  other  programs.  But  if  there  is  no 
expected  output,  the  user  cannot  detect  the  error.  To  avoid  this,  a  bell  is 
rung  at  the  Teletype  when  a  stem  is  entered  into  the  dictionary. 

The  program  was  first  written  to  produce  only  a  list  of  stems. 

Once  this  had  been  produced,  then  the  program  was  modiiied  to  generate  the 
stem-concept  dictionary.  Figure  III  -  5  shows  the  900  stems  presently 
included;  the  program  can  generate  up  to  5000  stems.  Figure  III  -  6  contains 
the  flowchart  and  Figure  III  -  7  is  a  listing  of  the  iii  si  twenty  dictionary  entries. 
A  listing  of  the  program  itself  appears  in  Section  VI. 
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AO 

ABILIT 

ABLE 

ABREAST 

ABSENC 

ABSTRACT 

ACADEM 

ACCEPT 

ACCES 

ACCLAIM 

ACCOUNT 

ACCREDITA 

ACHIEV 

ACQUIR 

ACQU1SI 

ACTIV 

ACTIVIT 

ACTUAL 

ADAPT 

ADDED 

AD BIT 

ADDS 

AOCOU 

ABMINIS7RA 

A  WIN!  STRAT 

advanc 

ADVANT 

AERO SP AC 

AFFAIR 

AFFECT 

ABC 

A6ENC 

AGREE 

AGREED 

AID 

AIDS 

AIMED 

AIMS 

ALICE 

ALLEG 

ALLOW 

AMERI 

ANALOG 

ANAL  Y  SI 

ANALYST 

ANALYZ 

ANNUAL 

APPEAR 

APPLI 

APPLICA 

APPROACH 

APT 

ARCHI V 

AREA 

AREAS 

&»«?!* 

ARISE 

ARRANG 

ARRIV 

ARTI  CL 

ARTXCULA 

ARTIS 

ASKED 

ASPECT 

ASSES 

ASSION 

ASSIMIL 

ASSI ST 

ASSOC!  A 

ATTAIN 

ATTEMPT 

ATTEN 

ATTEND 

ATTITUD 

AUTHOR 

AUTHOR# 

AU TOM A 

AVAIL 

AVAILABIL 

AWARE 

BACHELOR* 

BACK 

BACKGROUND 

BACKLOG 

BALANC 

BAR 

BASED 

BASIC 

BASIS 

BEARING 

BEARS 

BECOM 

BEGIN 

BEGUN 

BENEFI T 

BETTER 

BIBLI0  GRAPH 

BI0L0G 

BLOOMQU 

BLUR 

BODY 

B00K 

B0OKM0BIL 

BOOKS 

BOUND 

branch 

BREAKTHROUGH 

BRI  D6 

BRI  EF 

BRI EFL 

BRI EEL I 

BROWS 

BRYAN* 

BUILT 

BURDEN 

CAN  AD 

CAP  ABILIT 

CARD 

CARDS 

CARRI 

CARRT 

CATALOG 

CATALOGU 

CAUSE 

CENTER 

CEMTR 

CENTRAL I Z 

CERTAIN 

CH  ALL  ENG 

CHANG 
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It  should  be  remembered  that  there  are  several  parameters  of 
the  program  that  can  be  varied  at  will,  and  it  is  through  inspection  of  the 
results  such  as  those  in  this  report  that  the  parameters  can  be  refined. 

.Adjustable  are: 

•  Common  words  in  the  file  WORDS,  by  adding  or 
deleting. 

•  Specific  stems  in  the  file  WORDS,  by  adding  and 
deleting. 

•  Treatment  of  the  hyphen  as  a  delimiter  or  not,  by 
selecting  3  or  2  as  a  parameter  of  PLUCK. 

•  Varying  the  minimum  stem  length  generated  by 
removal  (exclusive  of  treatment  of  double  conso¬ 
nants  and  "i"  before  "ly"),  by  varying  a  single 
number  in  STEM. 

•  Varying  the  number  of  passes  allowed,  by  varying 
a  single  number  in  STEM. 

Figure  III- 5  gives  considerable  insight  into  the  operation  of  the 

i 

stem  analyzer.  Inspection  of  the  Figure  reveals  that  the  stem  analyzer  has 
done  a  good  job;  there  are  very  few  adjacent  stems  that  are  forms  of  the 
same  word.  An  investigation  of  the  number  of  artificial  homographs  created 
by  stem  analysis  would  require  a  side-by-side  comparison  of  words  and  the 
stems  they  generate;  thus,  no  evaluation  of  this  aspect  of  stem  analyzer  per¬ 
formance  can  be  based  on  Figure  III- 5  alone. 

An  important  use  of  Figure  III-5  is  in  determination  of  the  set¬ 
tings  of  the  various  parameters  of  the  stem  analysis  process.  For  example, 
it  appears  that  the  minimum  stem  length  should  be  set  at  four  characters, 
rather  than  five,  which  is  the  present  setting.  This  would  reduce  the  size  of 
the  stem  dictionary.  For  example,  the  words  "need",  "needed",  "needing", 
and  "needs"  would  all  be  mapped  into  "need"  if  the  minimum  stem  length 
were  reduced  to  four  characters.  Other  examples  of  four-letter  stems  that 
would  each  be  produced  from  several  entries  presently  in  the  stem  dictionary 
are  "stud",  "form",  and  "item". 
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It  is  not  clear  from  Figure  III-5  how  the  hyphen  should  be  treated 
during  stem  analysis- -as  an  alphabetic  character  or  as  a  word  delimiter. 

The  indication  is  that  dictionary  size  would  be  reduced  by  this  change,  and 
that  few  artificial  homographs  would  be  created. 

III.  2.  4  File  CONCEPTS 


File  CONCEPTS  contains  one  concept  vector  for  each  document 
in  the  collection.  This  File  is  generated  by  program  CONGRA. 

Program  CONGRA  first  reads  the  dictionary  file,  DICTNR'Y.  It 
then  processes  the  data  base  (DATA1),  finding  the  stem  for  each  word,  and 
looking  up  the  stem  in  the  dictionary.  The  weight  of  each  stem  in  the  docu¬ 
ment  is  the  number  of  its  occurrences,  normalized  so  that  the  largest 
weight  in  each  document  is  unity.  Up  to  fifty  components  of  the  concept  vec¬ 
tor  are  entered  for  each  document. 

Since  statistical  filtering  and  document  clustering  are  not  per¬ 
formed  by  the  presently  operating  experimental  prototype  of  the  System,  some 
doubt  existed  whether  this  simple  concept  vector  file  would  be  sufficient  for 
testing  the  dialogue  processor.  Figure  IU-8  shows  both  the  number  of  lines 
of  text  processed  as  the  program  proceeded  through  the  document  collection, 
and  the  number  of  components  in  the  concept  vectors  for  fifty  documents, 
and  illustrates  that  the  present  file  will  be  sufficient  for  testing  purposes. 
Figure  IH-9  ia  a  listing  of  part  of  CONCEPTS,  and  Figure  III  - 10  is  a  flowchart 
of  CONGRA. 

III.  2.  5  File  MESSAGES 

File  MESSAGES  contains  the  list  of  System-user  messages,  with 
five-digit  line  numbers.  For  each  message,  the  terse  form  is  stored  in  the 
file  before  the  verbose  form.  Within  either  form,  the  lines  of  multiple-line 
messages  appear  in  the  order  in  which  they  are  printed.  The  line  numbers 
identify  the  messages  as  follows: 
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25000  69  0.25000  152  0.25000  200  0.25000  208  0.25000 
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Figure  III-9  (concluded) 
CONCEPTS  File 


Add  one  to  CORRESPONDING 
WTD  element 
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Figure  III  - 10  (concluded) 
Program  CONGRA 
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Then: 


Let  the  five  digits  of  a  line  number  be  represented  as  NNTLL. 


NN 

T 

T 

LL 


message  number 

0  if  the  line  is  a  component  of  the  terse  form 

1  if  the  line  is  a  component  of  the  verbose  form 

the  number  of  the  line  within  a  multiple-line 
message. 


In  order  to  distinguish  between  blanks  that  fill  out  a  line  and 
blanks  that  are  important  for  spacing,  each  line  of  a  message  is  terminated 
by  a  vertical  arrow. 


Figure  III- 11  is  a  listing  of  file  MESSAGES. 


III.  2.  6  File  OFFLINE 


File  OFFLINE  is  used  to  store  documents  temporarily  for  later 
printing  offline  on  a  high-speed  printer.  If  offline  printing  is  requested,  file 
OFFLINE  is  created,  with  the  desired  contents,  but  it  is  not  printed  auto¬ 
matically.  The  user  may  "PERM"  the  file,  however. 

III.  2.  7  Core-Resident  Files 


This  subsection  first  defines  the  use  of  certain  major  variables, 
arrays  and  families  of  arrays,  and  then  shows  how  several  of  them  are 
allocated  to  common  storage.  Figure  III-12(a)  lists  the  families  of  arrays; 
Figure  III -12(b)  lists  all  the  main  program  variables,  and  Figure  III-12(c) 
the  common  variables  and  arrays. 
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7001  PRINT  QUERY  VECTOR? i * 

7101  D0  YOU  WANT  SO  SEE  THE  QUERY  CONCEPT  VECTOR? 


\ 


X  X 

H 

•• 

H  X 

a 

t- 

H 

2  3 

X 

T 

H 

HH  3 

> 

H 

CD 

2  *• 

X 

X 

X 

H 

HH 

UI  UI 

X  H 

HH 

> 

CD 

X 

CO  X 

to 

X 

<c 

*■ 

2 

3 

UI  c— 

<c  <c 

H 

X 

X 

HH 

H 

X 

H  _1 

X 

H 

1  H 

X  X 

X 

CO 

i3 

X 

2  X 

St 

a  a 

H 

<c 

is  (il 

UI  X 

2» 

X 

2 

X 

H 

o  o 

X 

•  • 

X 

X 

HH 

CO 

f-  (O 

03  H 

H 

S 

<£ 

H 

2 

hh  X 

5 

> 

X 

X 

X  IS 

X  X 

X 

o 

<t 

s 

0 

S  >H 

® 

B 

X 

HH 

X  f-»  ♦* 

Q 

X 

x 

<t 

X 

® 

X 

3 : 

to  o  c* 

x  *-• 

s 

X 

o 

2  M  UI 

Z  Q 

® 

-1 

H 

2 

l  H 

B  X  J 

<C  X 

CO 

HH 

CO 

X 

2  X 

HH  HH  HH 

X  > 

X 

HH 

3 

S  X 

H  O  X 

X 

X 

X 

O 

o  o 

<t  X 

HH 

® 

>- 

X 

O  X  V 

X 

X 

X 

X 

CO 

hh  co  X 

X 

< 

X 

H 

X  < 

X 

2 

X 

H 

>- 

X 

►i  X  X 

X 

® 

Q  X 

X 

CD 

0  3® 

z 

HH 

X  X 

X 

X 

HH 

M  Q  X 

S  f- 

f- 

s  ® 

X 

3 

X 

X  >.  2 

HH  CO 

<t 

a 

•n 

3 

3 

CO  X 

H  <t 

y 

H  CD 

CD  f- 

<£  X 

X 

2 

® 

X 

1  H 

X  2 

X 

® 

X  HH 

t—  ♦- 

O 

2  X 

T  hh  X 

X  2 

X 

X  z 

• 

X 

B  UI 

H  H  X 

X  X 

2 

H  CD 

»  • 

•  • 

<E 

O  O 

X  H 

X  x 

HH 

HH 

X  o 

X 

H  X 

s  2 

_)  to 

X  ® 

CO 

X  2  X 

O  >-j 

o 

J 

X  Q 

H 

X  ® 

HH 

3  2 

x  : 

CO 

I 

2  « 

X 

X 

:  •• 

HH 

0 

H  H 

a: 

X  f- 

%  H 

X 

>-i 

CO  2  3 

>» 

< 

®  to 

x  :  x 

f— 

UI 

t->  X  ® 

X 

x  ► 

X  >- 

®  a  x 

3 

••2  £  H 

<c 

CD  • 

CO 

®  <c 

X 

•  X  3  2 

X  • 

®  a 

<t 

2 

H 

l  H 

O  £  O  w 

®  t- 

HH  X 

h  x  :  x 

<t 

2  X 

in  3  ®  X 

X  2 

-  X  H 

<r  < 

“5  J 

2 

<S  UI 

O  Q  X 

2  X 

-  m  z 

Q  > 

co  3  %  m 

HH 

a  o 

N  ® 

a  a 

(H  HH  HH 

H  X 

2  2  :  <£ 

2 

a  a  ►  x  *■  <t 

H  hh 

•  OQ  X 

a  •  hh 

B  Q  X 

X 

it  ®  ~ 

®  *-  X  ♦* 

hh  a  x 

HH  2  2  HH 

X 

f-. 

CO  O  CO  «V.  H 

Mh  n  • 

X  X  H 

r-*  ®  X  <t 

H 

I 

(-  in  f-  X  X  2 

J  Z  p.  V  j 

3.  HU 

OL  HH  ►  I  > 

0 

HH  H  U  ^  £  <C 

2 

m  <  co  _i  _i 

<t  2  X 

®  H  H.  <t 

1 

*— * 

X  H  X  X  ^  X  3 

® 

w  3  f-  CO  3J 

X  HH 

tt.  «  H 

X 

CO  *•  £hh{- 

hh 

m  z  3 

CD  X  X 

«  ®  CO  CO  co 

* 

X  <C  •  X  3  3 

CO  X 

3  X  ®  CO  S  X  Z 

O  2  2  2 

Q 

®X>-®2>-H® 

CO  X 

H  ®  2  i-i  * 

H  M 

OX®®® 

2 

1  H 

J  X  X  2  >• 

X  X 

2>-3->H_JZ.J 

P*  X  HH  M  HH 

X 

Z  X 

*  X  •  X  X  hh 

O  £ 

hh  O  *  X  <X  OQ  X  | 

SI  H  h  H  h 

3 

Q  UI 

®e-3®X3X® 

o  5 

X  B  ®  #  X  X 

HH  X  2 

1  2  X  X  X 

*- 

o  o 

•-2<£O2t-0X0 

«■  «•  2 

*.xaa*xf-cnaQ® 

H  X  ®  ®  ® 

""  to 

2  W 


u 

P 

UD 


—  —  cvj  —  ~  w  ~  —  cm  n  ^  —  cu  — •  —  —  —  cm  — «  cvi 

oooooooooocooooooooooooooo 
_  O  —  —  0--0-  —  -  —  —  o  —  o  —  o  —  —  —  ~ 


oo  oo  oo  oo  <h  ov  »> 


ooo--  —  cucvjcvj<Nr')coo<r<rtninin>o>0 

CVIOlCMCUCUCUCNCVJCUWtMCVialWCViCVlWCVICVI 


—  <m  -  «-*  co  n 
o  o  o  o  o  o 

-"•“O'-  —  *- 
t-  f*-  oo  oo  oo  go 
cu  cu  cu  (m  <m  cu 


TTT-72 


28  1  04  A  NEW  SEQUENCE  0R  SIGNING  0FF.t 

28  105  t 

28  106  "M0D"  -  MODIFY  0R  REPLACE  THE  PRESENT  QUERY  AND  CONTINUE* 

28107  THE  PRESENT  QUERY  SEQUENCE. » 

28 108  t 


«• 

o  3 

*• 

— 

3  z 

71 

X 

71 

x  >-« 

{-■ 

o 

H 

X  3 

X 

z 

71 

a 

Ul 

X 

y 

3  -1 

o 

3 

x 

X 

*■  z  «• 

3 

3 

z  a 

y  a  x 

i 

► 

X 

z 

I-I  x 

X  CJ  X 

i 

• 

71 

4: 

Ul  f- 

Ul 

i  y 

3  Ul 

3  a  X 

-l 

X 

«- 

y 

4 

Z  X 

o  z  2 

•H 

X 

• 

X 

3 

►i  f- 

<£ 

X 

3 

71 

X 

> 

X 

Z  71 

O 

a 

3 

Ul 

y  • 

M  71  i- 

y 

z 

3 

«-< 

f~  : 

I  Z 

X 

H 

X 

Q 

>  bi  n 

m 

< 

Z 

• 

71 

u. 

y  z 

X  H*  (— 

• 

X 

X 

X 

*1 

ixi 

CQ  Ul 

Ul  71  2 

a 

a 

71 

X 

X 

X 

* 

o  a 

Ul 

X 

X 

X 

H 

M 

Z  71 

0*0 

7) 

X 

•» 

X 

H 

3 

a  * 

71 

3 

Q 

• 

X 

3 

a 

X  Q  X 

H 

*» 

Q 

X 

z 

•* 

Q  * 

3xj 

Ul 

• 

X 

X 

> 

•—4 

71 

ui  — 

«*  ►*  a  a 

00 

U) 

a 

H 

a 

3 

X 

— 

z  : 

•  H  3  X 

X 

X 

X 

X 

3 

7) 

X 

Ul  *-«  *■  (*• 

V. 

H 

X 

4 

z 

Q 

>• 

3  71 

o  z  •  > 

3 

2 

X 

a 

-1 

4 

H  3 

Owblit  X 

X 

X 

a 

a 

n 

-1 

y 

3 

a  Q  in  «•  j 

a 

X 

1- 

n 

X 

z 

Ul  X 

X  X  X  3  .  X 

Z 

X 

X 

X 

3 

71 

3 

X  O 

3  a  a  o  4 

a 

X 

X 

CQ 

O 

3 

3 

►13  3 

*i 

X 

X 

4 

a 

y 

Z  71 

Q  2  Ul  >  Ul 

t-» 

a 

a 

•» 

X 

3 

X 

a  z  x  x  x 

X 

Ul 

7} 

H 

n 

X 

> 

> 

Ul 

O  % 

111  H  S  U  H  O 

a 

H 

n 

x  «■ 

X 

X 

3 

71 

71  I-  X  3  X  3 

•• 

X 

71 

X  • 

a 

n 

X 

ca 

Q  X 

*<£!«.  JHJ 

•  ■>- 

-1 

4 

H 

f—  : 

h* 

X 

X 

H 

Z  Ul 

Ul  T,  n  Ui  n 

x  ? 

X 

3 

z 

3  X 

»• 

r* 

n 

3  ca 

H  S  Ul  3  X  3 

3  3 

a 

> 

X 

_1  Z 

•  X 

X 

a 

u.  z 
x 

a  x 


► 
z 
a 
«1 
«■  H 


f- 

:  a  -  o 

H 

X  3 

z 

y  x  _j 

►1 

o  o  co 

X  Z  3 

X  ► 

3  X  x  X 

n  • 

x  3  >1  X 

X  71 

X  o  3  CD 

7)  2 

a  x  >  y 

a  a 

2  71  3  3 

CM  — 

z 

H 

XXX 

CJ 

ui  a. 
x  ►* 
ui  x 

71  71 


a  a 
u.  Z 
ui 
3 
Q 
X 
71 


> 
111  3 
X 

a  x 

S3 

m  jr 


> 

UI  UI 

55 

f—  71 
UI  71 

X  <t 


UI 

CO 

a 

t- 


U1 

►1 

x 

f- 

Ul 

X 


o 

a 

a 

ui 


a  x 

Q  X 
3  U. 

a 

i  c 

i 


H 

X 

3 

f~ 

71 


a 

n 

x 

X 

y 

I  h* 


71 

n 

X 

(— 


ui 

> 

Ul 


> 

Ul 


TJ 

"L.  tt> 

C  —• 

o  r 

o  * 
""c/i 
3  W 

a  < 

71 
«  71 

b  w 


**  *•  Ui 


X  ~ 

H  «v- 

ui  a 

X  X  Ul 
H  I  H 
Ul  Ul  2 


—  CM 


n 


->  X 

ui  a 

71 

x  _i 

n  a 
—  71 


x  a  x  o 
3  3  3 

x  x 

71  Ul  71  u. 
Ul  QQ  Ul 
Q  £  Q 
a  -)  a 
T  Z  Z 


X 

• 

H 

X 

X 

H 

X 

►i 

y 

M 

X 

X 

—  , 

X 

3 

z 

H 

*• 

X 

X 

£ 

3 

co 

X 

3 

Z 

X 

7) 

X 

►1 

y 

X 

3 

71 

z 

X 

► 

n 

71 

H 

>— i 

> 

X 

H 

71 

Q 

•» 

3 

t- 

a 

• 

71 

> 

X 

O 

a 

X 

X 

H 

Z 

a 

z 

X 

• 

X 

1- 

X 

X 

X 

• 

z 

£ 

X 

X 

3 

X 

H 

X 

X 

> 

> 

z 

— 

X 

X 

a 

X 

n 

a 

— 

3 

a 

H 

> 

£ 

X 

— 

X 

X 

X 

z 

o 

X 

H 

X 

— 

X 

3 

Q 

X 

71  f— 

T. 

H 

X 

71 

3 

Q 

X 

71 

3 

X 

o 

X 

x  I 

CJ 

3 

X 

X 

z 

X 

X 

X 

y 

71 

X 

a 

a 

i 

3 

O 

X 

f— 

*• 

H 

a 

3 

> 

n 

X 

• 

H 

3 

a 

3 

71 

a 

• 

7) 

-H 

3 

X 

3 

a 

Z 

Z 

X 

X 

71 

X 

Q 

X 

X 

h* 

3 

71 

£ 

7) 

X 

H 

X 

n 

Q 

Q 

o 

X 

H 

X 

X 

CJ 

X 

71 

X 

• 

a 

£ 

H 

3 

X 

x 

x  ! 

X 

X 

a 

a 

X 

X 

a 

3 

X 

X 

X 

3 

X 

X 

3 

X 

X  1 

X 

X 

X 

X 

X 

3 

O 

X 

H 

z 

£ 

£ 

o 

3 

o 

a 

X 

X 

a 

X 

X 

X 

a  ►  o 

u 

3 

X 

X 

X 

a 

1- 

X 

i 

X 

3 

Q 

71 

CO 

X 

71 

y 

3 

Q  ••  H 

H 

H 

Q 

3 

X 

3 

X 

71 

—  cu  —  cviOT'Cr-chO  —  cm  c*>  <r  t  —  —  —  —  —  —  —  — 

ooooooooo  —  —  —  —  —  —  oocooooo 

oo  —  —  —  —  —  —  —  —  —  —  —  —  —  o  —  o  —  o  —  o  —  — —  —  o  — 

71717)717171717)7171717)7)717) 

nnnnnnnnnnnnnn 


O  —  CM  —  —  —  —  CVJ  —  —  CM  — 

oooooooooooo 
o  —  —  o  —  —  o 

a 

<r  <r 


7)'0'0r'C''BQW(>0'  ooo  —  —  CMOicuan 
nconncoannci 


III  -  74 


44101  SHOULD  PRINTING  OF  BIBLIOGRAPHIC  DATA  PREVIOUSLY  PRINTED  BE » 

44102  SUPPRESSED? * » 

45001  SPACES  IN  TEMP.  WANT  M0RE?  *  » 

45101  SPACES  EXIST  IN  THE  TEMPORARY  FILE  FOR  NEW  RETRIEVALS.  IS  MOREt 

45102  SPACE  DESIRED?: » 


. 

J 

«- 

H 

•J 

«• 

* 

Z 

3  «- 

> 

. 

Ul 

U.  CC 

X 

Ul 

x 

s 

UJ 

o 

3 

► 

c0  U.  •• 

3 

z 

o 

M 

m  CD 

3 

Ul 

® 

«  z 

3 

Q 

> 

UJ  4  h 

H 

3 

1 

X 

O  HQ 

Z 

Ul 

H 

UJ 

Z  0  3 

Ul 

CO 

Z 

3 

uj  <C  3 

CO 

UJ 

3 

3  «-  XU 

Ul 

•>- 

X 

3  *  H  * 

X 

c 

3 

UJ 

UJ  >.  CO  UJ 

a. 

Ul 

o 

X 

;o  x  cn 

3 

H 

UJ  <£  * 

CD 

3 

5 

>3  UJ 

Z 

CD 

a:  3  so 

>-« 

CO 

5 

Z 

UJ  z  z 

X 

a 

>-t 

* 

3  X  will 

3 

X 

V 

H 

<3  Ul  3 

Q 

H 

Q 

. 

x 

<* 

x  <t3 

z 

z 

® 

CO 

CO  H  H  Ul 

_J 

CD 

<r 

s 

u. 

Q 

-1  ®  <C  c/> 

<Z 

Z 

1-4 

X 

X  z  a 

> 

t-4 

>• 

H 

co 

£ 

H  <  V 

Ul 

X 

X 

-4 

Q 

3 

U2» 

►4 

3 

UJ 

X 

CD  CD  h  lil  n 

OS 

Q 

3 

UJ 

£ 

> 

z  z  x  3  a 

H 

3 

X 

S 

z 

4- 

h  m  a.  3  ui 

UJ 

UJ 

X 

< 

•O 

X  H  <C  H 

CC 

X 

H 

s 

UJ 

•» 

3  3  -  X  CO  Z 

1 

« 

Z 

o 

X 

UJ 

M 

CO 

Q  O  ~  CD  *-i  m 

UJ 

u. 

UJ 

H 

H 

• 

o 

UJ  ®  I  X 

cc 

Ul 

CO 

X 

4- 

< 

a 

X 

a  *  x  *4  h  a. 

«•  OJ 

UJ 

® 

>- 

mm 

O 

z 

® 

IU  U  2  J 

X 

•• 

X 

u. 

UJ 

►4 

Ul 

3 

>  ®  OQ  CD  V 

« 

(*•  Q 

X 

<-* 

X 

_J 

c 

UUIU.  w2Q 

cc 

Z  UJ 

CO 

Q 

<c 

X 

H 

H  2  B  H  <1 

u. 

•-«  H 

U) 

H 

® 

UJ 

H 

Z 

x  ®  h  a:  uj 

<t  Z 

X 

z 

X 

cO 

X 

CQ 

<£ 

HlKQ32 

Q 

CD  -< 

H 

UJ 

Q 

uj  2  ®  z  a  _j 

Ul 

<C  X 

X 

X 

X 

X 

Q 

Ul 

x  cn  x  <  <2 

a 

a. 

UJ 

3 

►  s 

® 

® 

Ul 

H 

co  a 

3 

o 

CO 

o 

M 

3 

3 

UJ 

o  uj  <  _> 

Z  >  H  O 
m  U  4  * 

nc  nq  ►  u3 
Z  K  • 

<C  H  O  H  H 
KIiImWZ 
K  ImUI 

V  a.  _j  x 

_j co  <  3 

iZHCCSO 
S202S 
US  i  Q 
HImS 
z*  j2« 
h  o  m  hh 
K  (S  h  2  X 
0.  3  CD  ®  H 


<£ 

H  * 
<X 

Q  r- 
►  H 

O  «  O 

►H  f*-  <1 

X2K 
IhH 
<r  <t  co 
i  X  13  CD 
CD  <S  <£ 
® 

•  —t  H  H 
J  2  Z 
i  DQ  »<  «h 
hK  2 

qq  a.  a. 


»  H  *• 
<*•  Z  (*- 

•  <t  CD 

O  3  Z 
®  >-« 

a  3  r 

l  s  o 

•  >-  x 

o  < 

®  ®  Ul 

Qon 


CB  c-  Ul 

a  z  ui 

S  CO 
UJ  >-« 

I  f-  s 
H  O  H 
<X 

>-  H 
U.  Q  Z 
M.  IX  3 
O  ®  3 

uj  3  i 

a.  3 

<4  V  ® 

x  >• 

3  UJ 
®  3  ® 

z  cs  a 


®  - 
H  * 
>- 
H  X 

.5S 

M  3  ® 

c*- 

UJ  3  Z 
H  S  >-« 
UJ  >. 

dss 
a  a  z 


__C\j„_ajC0-.  —  0J  —  -'OJ  —  O-'CM—  —  —  —  —  — 

OOOOOOOOOOOOOO  —  ooooooooooooooo 

—  ^  O  — *  O  — •  O  —  o  *■—  o 

'$'0-'0!''r~-r~r-C|Dg3®<>(>(>0  —  —  CYOjr>n4'*'i4u"iO'Of^r~ro 


III- 75 


H  <o 

©  3 

z  S 


«• 

♦- 

•a 

X 

IS 

X 

H 

© 

O 

H 

UJ 

•> 

o 

> 

X 

3 

3 

> 

s- 

«• 

t — 

X 

— 

Z 

>• 

w 

3 

X 

3 

'3 

UJ 

3  «. 

Z 

Q 

3 

•« 

t-« 

Z 

<3 

H  c- 

X 

< 

Z  3 

*> 

o 

H 

3  O 

• 

X 

>* 

♦- 

Z 

CO  Z 

H 

<t 

X 

•• 

UI 

3  3 

Z 

3 

3 

A 

CO 

X  3 

►H 

CO 

3 

• 

• 

>- 

UI 

X  3 

X 

3 

CO 

z 

X 

3 

X 

H 

OJ 

< 

X 

X  CO 

z 

H 

«“« 

3 

s 

3 

Z 

* 

>• 

UI 

®  > 

H 

V 

3 

o 

X 

X 

>  X 

3 

CO 

1 

H 

H 

3 

X 

O 

3 

3  3 

3 

s 

X 

% 

© 

X 

CO  3 

S 

o 

X 

CO 

H 

H 

3 

X 

1 

o 

•-( 

<t  3 

3 

X 

OJ 

H 

3 

a  3 

*- 

z 

3 

z 

«■  3  Z 

• 

H 

3 

s 

• 

<C 

a 

-  3 

UI 

Z 

s 

y. 

3 

UJ 

<*•  <E 

z 

3 

3 

«- 

Q 

x  z 

©  <9 


s:  3  Q 
x  u  3  h 


X  3  <5  H 
3  0  3  tO 

a.  ui  x 

si  3  © 

«?3Q'H 

>UIUh 

ui  zm  z 


§  g 

o  a 


a  uj 

a: 

as  © 


UJ  3  UJ 
X  3  _J 
IS  3  0. 
S  2 
H  Q 
y  Z  u 

X  uj 

©  fn  UJ 


<£  3  <  X 

i— 1 

X 

S 

< 

3 

r  x  uj  co 

U 

Z 

M  CO 

a 

3 

3 

X  H  H  3 

o 

55  x  x  <  *• 

<c 

-A 

3  «•  3  •  Z 

© 

►H 

Z 

HZ® 

s 

U  «  3  X  X  •• 

V 

z 

H  ••  03  H  h 

£ 

3 

3  <C  Z  3  4. 

a 

©  ••  X  3  <'- 

X 

SI 

© 

3  f'-  O 

3 

H 

•X  3  Z  ®  • 

a  n  3  si 

*- 

3 

3 

HI 

j  •  ra  3  s 

© 

1 

X 

<  >-  H  H 

•  ©  O  B  Q 

— 

3 

►H 

H 

3  >  H  >  H 

H 

H 

3 

<C  O  Z 

2 

HUH<tHX>.® 

X 

o 

Q  3 

X 

U 

*  Q  >H 

3  ®  3  © 

X 

3  3 

X  H  •  H 

f— 

M 

3 

•• 

z 

*  H  X  3  a  X 

DC 

X  Q  H  X  H  3 

3 

H 

3 

3 

H  X  Z  Z  Z 

Z  X 

O  X 

s 

••  Z  ®  «  X 

CL 

1  Z  3  Z 

3 

Z 

C 3 

3 

Q  <Q4* 

-  <t 

M 

z 

M  • 

a  p-  <t  h  > 

3  .  <C  X  <  >-  O 

3 

•- 

to 

Z  3  3 
Z  3 
<23 
a  ►-»  © 
h  >- 
3  Z 
WOO 

>.  o  o 


••  •*  3 

as  v  3 

®  3  Q 
H  i-i  >• 

o  a 

U  8  S 
>  Z  Q 


<t  is  <c  • 

a  o  x  a  a 

as  x  x  3  3 

uj  u;  ui  <  « 

H  H  H  >  > 

z  z  z  z  z 

UI  UJ  UJ  M  fi 


UJ  *— «  _ J 

>  as  j 

'  H  3 
3  UJ  Z 
3  as 

3  VJ 

i  Z  <  «-« 


V  IS 
U.  a  3 

n  © 
O  3  >. 
UJ  X 

a.  is  is 
to  x  a 


x  to 

3  H  3 
3  z  as 
(Mill. 
CO 

3  UJ  UJ 
uj  a:  x 
ZQ.I- 


Q  CO  _J 

Ui  <£  3 
as  a.  cs 

<  UJ 
U)  uj  _J 

3X3 
UH  m 


—  —  -  ™  —  —  -  —  CVJ-  —  —  —  -  -CUO 

ooooooooooaoooooooooooooooooooaoooo 
»-«  O  *“•  O  Q  -«  O  Q  *■*  o  —  O'-  —  0"0"*  O'- 

mitO'000'«-c«ww^<Tiflin<>0'Or‘MflS)(M>ooo-<-  cvi  cu  «  ^ 

i7)ihi/i\0'0'0'0'0'e'«'0'0'0'0'0'0'0'0'0'0'0'0'e'0r-r,>r-c~p-t^r-i,~r-p- i'¬ 


ll!-  76 


Figure  III  — 11  (concluded) 
MESSAGES  File 


Name 


Contents 


TEMP 

Temporary  file:  accession  num¬ 
ber,  temporary  identification 
number,  correlation  coefficient 
and  rank  when  last  retrieved, 
print  suppression  flag  and  flag 
indicating  if  retrieved  on  last 
executed  retrieval. 

PRE 

Words  for  queries:  the  words, 
their  stems  and  concept-weight 
pair  mappings. 

QUERY 

J 

The  query  concept  vector. 

F igure  III-lZ(a) 
Families  of  Arrays 


Name 


e 


Use 


IX 

Next  available  temporary  identi¬ 
fication  number. 

JX 

I 

Number  of  entries  presently  in 
TEMP. 

NEWQ 

L 

Mode  setting  precedes  initial 
query. 

RFLG 

L 

Present  query  not  initial. 

DEFLG 

L 

DEL  entered  through  RET. 

WFLG 

MOD  has  altered  PRE. 

SEEFLG 

L 

SEE  activated  by  MOD. 

WRDFLG 

L 

WRD  activated  by  MOD. 

DOCDOC 

T 

A.J 

Last  retrieval  in  present  sequence 
used  document-document  corre¬ 
lation. 

TERSE  * 

L 

Terse  dialogue:  Mode  1  selected. 

SKIPl 

L 

Skip  initial  query:  Mode  2  selected 

PRINTQ 

L 

QUERY  available  immediately 
before  retrieval:  Mode  3  selected. 

OPTION 

L 

HELP  prints  all  options:  Mode  5 
selected. 

LGSTACNO  * 

I 

Largest  document  accession  num¬ 
ber  in  the  collection. 

*  in  common  storage 


F igure  Ill-12(b) 

Program  Variables  and  Arrays 
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Name 

Dimension 

Type 

Use 

TEMPI 

50 

I 

accession  number  of  document  in 
temporary  file 

TEMP  2 

50 

I 

temporary  identification  number  of 
document  in  temporary  file 

TEMP3 

50 

F 

correlation  coefficient  of  document 
in  temporary  file  when  last  retrieved 

TEMP4 

50 

I 

rank  of  document  in  temporary  file 
when  last  retrieved 

TEMP5 

50 

L 

set  when  printing  of  bibliographic 
data  for  document  in  temporary 
file  is  to  be  suppressed 

TEMP6 

50 

L 

set  when  the  last  executed  retrieval 
command  retrieved  this  document 

QUERY1 

50 

I 

concept  number  for  a  component  of 
the  present  query  vector 

QUERY2 

50 

F 

concept  weight  for  a  component  of 
the  present  query  vector 

NONO 

100 

I 

accession  number  of  documents  for 
which  retrieval  is  suppressed 

TERSE 

— 

L 

set  if  terse  mode  is  selected 

LGSTACNO 

— 

I 

largest  accession  number  in  docu¬ 
ment  file 

PREl 

5  x  25 

A 

present  query  word 

PRE2 

3  x  25 

A 

present  query  stem 

PRE3 

3  x  25 

I 

concept  numbers  for  stem 

PRE4 

3  x  25 

F 

concept  weights  for  stem 

Figure  III-lZ(c) 
Common  Storage 
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III.  3  FLOWCHART  OF  THE  DIALOGUE  PROCESSOR 

Figure  UI-13  contains  a  flowchart  of  the  dialogue  processor, 
whose  operation  is  described  in  subsection  III.  1. 1. 
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Figure  III  - 1 3  (cont'd) 
DIALOGUE  PROCESSOR 
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2 — 2 


5-2 


i  * 


i  ; 


Generate  QUERY  vector  from 
its  present  components  plus  data 
in  PRESNT  table. 


I 


330 


Figure  III  - 13  (cont'd) 
DIALOGUE  PROCESSOR 


Figure  111-13  (cont'd) 
DIALOGUE  PROCESSOR 
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Ill- 89 


1010 


t  =0 

Add  Accession 

Numbers  on 

TEMP  to  NONO 

1020 

1040 


Clear  Print 
Suppress  Part 
of  TEMP  tables 


Print 

46 


1090 


Print 

42 


I 


Print 

IX 


T 


=o 


\  Print  / 

\  Print  1 

Ji  50  -  JX  L 

V _ t 

MJ 

J  YESNO 

E 


1 

.  T.  DEFLC 

t  ^ 

DIALOGUE  PROCESSOR 
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Figure  III- 13  (cont'd) 
DIALOGUE  PROCESSOR 
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Figure  III  - 13  {concluded) 
DIALOGUE  PROCESSOR 


SECTION  IV 


SUBROUTINES  CALLED  BY  THE  DIALOGUE  PROCESSOR 

This  section  briefly  describes,  and  flowcharts,  each  subprogram 
called  by  the  dialogue  processor. 

Figure  IV-1  is  a  directory  of  all  programs  and  subprograms 
that  comprise  the  dialogue  processor  and  the  file-constructing  programs. 
This  directory  tabulates  the  location  within  this  report  of  the  description 
and  listing  of  each  routine.  Flowcharts  are  all  co-located  with  descriptions. 

IV.  1  FUNCTION  DOCK 

IV.  1. 1  Purpose 

To  fetch  authors'  names,  titles  and  abstracts  from  the  test 
data  file  DATA1  selectively. 

IV.  1.  2  Action 


Upon  a  call  to  the  logical  function  DOCK  (CODE,  I,  ARRA  Y, 
COUNT),  the  following  are  input  parameters: 

CODE,  an  integer  that  is  1,  2  or  3  according  as  titles,  authors 
or  abstracts  are  desired. 

I,  an  integer  that  specifies  the  accession  number  of  the  desired 
document. 

Output  parameters  are: 

ARRAY,  eighteen  words  of  ASCII  information  representing  one 
line  of  the  returned  alphanumeric  information. 

COUNT,  the  number  of  words  in  ARRAY  preceding  the  point 
(if  any)  where  all  remaining  words  are  filled  with  blanks. 

This  is  used  to  avoid  printing  blanks  that  fill  out  lines. 


Calls 

Called  By 

PROGRAMS 

CONGRA 

PLUCK,  STEM 

III-64 

VI -41 

DIALOGUE 

All  subroutines, 
directly  or 
indirectly. 

m-i 

VI- 22 

DICGEN 

PLUCK,  STEM 

III- 5  5 

VI-43 

SUBPROGRAMS 

DOCK 

None 

DIALOGUE 

IV-1 

VI-2 

LENGTH 

None 

STEM,  NUM¬ 
BER 

IV -3 

VI-4 

LOOKUP 

None 

DIALOGUE 

IV -6 

VI-7 

NUMBER 

LENGTH,  OUT 
PLUCK.  PUT, 
YESNO, ZORCH 

DIALOGUE 

IV-ll 

VI-10 

OUT 

None 

DIALOGUE, 

NUMBER, 

YESNO 

IV-19 

VI-13 

PLUCK 

PUT 

NUMBER, 

STEM, 

CONGRA, 

DICGEN, 

DIALOGUE 

IV-19 

VI  -14 

PUT 

None 

PLUCK,  NUM¬ 
BER,  STEM, 
CONGRA, 
DICGEN 

IV-23 

VI-16 

STEM 

LENGTH, 

PLUCK,  PUT 

DIALOGUE, 

CONGRA, 

DICGEN 

IV-23 

VI-17 

WHERE 

None 

DIALOGUE 

IV -  28 

VI-20 

YESNO 

OUT 

DIALOGUE, 

NUMBER 

IV -34 

VI-21 

ZORCH 

None 

NUMBER 

IV-ll 

VI-10 

F igure  IV  - 1 

Directory  of  Programs  and  Subprograms 
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fixfiCK  (C0DE.I,  > 
k  ARRAY,  C0UNT)  > 


Note:  FLAG  is  initially  false 


A  >  LGSTACN0k, 

OR  C0DE  O  \  YES 

OR  C0DE>  3  / 


PRINT 
ERROR 
MESSAGE , 


.F.  —DOCK 
.  F.  — *  FLAG 
0  —►COUNT 


BEGIN  FILE 


READ  DATA!  / 

rn^X  K 

RECORD  r  1 

\ -♦ARRAY/ 

►<END  FILE  ^  ^ 

\  / 

jj  ' YES 

(  j 

Figure  IV-2  (cont'd) 

Function  DOCK 

DOCK,  the  logical  value  of  the  function.  It  is  true  if  more  lines 
of  the  requested  data  exist;  subsequent  calls  to  DOCK  with  the 
same  input  parameters  will  return  additional  lines  of  the  data 
in  ARRAY  until  the  last  line  is  delivered.  When  the  last  line 
has  been  transmitted,  the  value  of  DOCK  will  be  false. 

If  CODE  is  different  from  1,  2  or  3  or  I  is  greater  than  the 
largest  accession  number,  the  routine  will  print  an  error  mes¬ 
sage  and  return  with  DOCK  false  and  COUNT=0.  Recall  that 
accession  numbers  entered  by  remote  users  pass  through 
NUMBER,  and  that  subprogram  has  the  task  of  gracefully 
informing  the  user  when  he  specifies  an  illegal  accession  num¬ 
ber. 


IV.  1.  3  Method 


Data  are  read  sequentially,  with  the  first  four  characters  of 
each  line  being  scanned  in  order  to  determine  the  beginning  of  documents 
and  fields  within  documents.  Before  a  line  is  transmitted,  the  following 
line  is  checked  to  see  if  the  transmitted  line  is  the  last  of  a  sequence.  If 
so,  DOCK  is  made  false.  If  more  lines  follow,  the  second  line  is  saved  for 
transmission  on  the  next  call. 


IV.  2  FUNCTION  LENGTH 

IV.  2. 1  Purpose 

Function  LENGTH  splits  strings  and  counts  their  length 
IV.  2.  2  Action 

In  a  call  to  LENGTH  (INPUT,  RIGHT,  LEFT,  CUT),  INPUT 

and  CUT  are  the  input  parameters.  All  variables  in  the  calling  program 
are  typed  alphanumeric  except  for  CUT  and  the  value  of  the  function,  which 
are  integers.  The  alphanumeric  variables  are  stored  in  arrays  of  five 
words,  left  justified  with  remaining  spaces  filled  out  by  blanks. 
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Figure  IV-3  (cont'd) 
Function  LENGTH 


r 
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Figure  IV-3  (concluded) 
Function  LENGTH 
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LENGTH  takes  the  input  string  and  counts  the  number  of  charac¬ 
ters  in  it,  up  to  the  first  blank  encountered  or  the  end  of  a  totally  filled 
input  array.  The  count  is  returned  as  the  value  of  the  function;  suppose  this 
is  called  L.  The  characters  may  be  numbered  1,  2,  .  .  .  ,  L,  with  the  first 
character  in  the  input  being  number  one.  Upon  return,  characters  1,  2,  .  .  .  , 
L-CUT  are  returned  in  LEFT  and  characters  CUT,  CUT+1,  .  .  .  ,  L  are  in 
RIGHT.  If  CUT  is  greater  than  or  equal  to  L,  the  entire  string  is  returned 
in  RIGHT;  if  CUT=0  the  entire  string  goes  to  LEFT.  Negative  values  of 
CUT  are  not  allowed.  INPUT  is  not  altered  by  use  of  the  function,  and  a 
call  with  INPUT  null  (all  blanks)  results  in  a  return  with  both  LEFT  and 
RIGHT  null  and  the  value  of  the  function  equal  to  zero. 

IV.  2.  3  Method 

See  the  flowchart  of  LENGTH.  Note  that  in  order  to  obtain 
efficient  operation,  characters  are  moved  by  the  word  in  the  formation  of 
LEFT  to  as  great  a  degree  as  possible. 


IV.  3  FUNCTION  LOOKUP 

IV. 3.1  Purpose 

To  search  the  concept  dictionary  file  for  a  given  stem,  and  to 
provide  the  concept  vector  for  that  stem  if  it  is  present  in  the  dictionary. 

IV.  3  .  2  Action 


The  function  LOOKUP(I)  is  logical  in  type.  Its  input  parameters 
are  the  value  of  I  and  che  stem  located  in  the  1^  position  of  the  Present 
Table,  PRE2(1,I),  PRE?(2,I)  and  PRE2(3,I).  The  function  searches  the 
concept  dictionary  file  DICTNRY  for  the  stem  in  the  specified  position  of  the 
Present  Table,  returning  a  value  in  the  function  name  of  FALSE  if  the  stem 
is  not  in  the  dictionary,  if  the  stem  is  found,  a  value  of  TRUE  is  returned 
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in  the  function  name  and  concept-weight  pairs  are  returned  in  the  Present 
Table.  The  concept  codes  are  placed  in  PRE3(1, 1)  through  PRE3(3,I)  and  the 
corresponding  weights  are  placed  in  PRE4(1, 1)  through  PRE4(3,I). 

In  the  event  that  a  stem  is  not  found  in  the  dictionary,  values  of 
zero  are  returned  for  both  the  concept  codes  and  the  concept  weights  in  the 
Present  Table. 

IV.  3.3  Method 


The  dictionary  lookup  itself  is  performed  using  a  binary  search, 
and  the  length  of  the  dictionary  allowable  is  unlimited.  This  is  accomplished 
by  working  with  segments  of  the  dictionary.  The  present  segment  length  is 
50U  stems  with  their  associated  concept  code-weight  pairs,  but  this  can  be 
altered  as  available  core  permits. 

When  the  routine  is  first  entered,  the  status  of  the  present  dictio¬ 
nary  segment  is  checked.  If  no  segment  is  in  core,  then  the  first  segment  is 
read.  There  are  flags  which  are  set  to  indicate  if  the  present  segment  is  the 
first  one  (read  immediately  following  a  BEGIN  FILE  DICTNRY  or  on  initial 
entry  to  LOOKUP),  or  the  last  (end  of  DICTNRY  file  encountered  on  last 
reading  of  a  DICTNRY  segment). 

A  stem  is  first  checked  against  the  lowest-  and  highest-collating 
stems  of  the  dictionary  segment  presently  in  core.  If  it  is  outside  of  the 
limits,  a  higher  or  lower  segment  is  read  into  core  as  appropriate.  An  excep¬ 
tion  occurs  if  the  stem  collates  above  the  present  segment  and  the  present 
segment  is  the  highest  ordered  one,  or  J  the  present  segment  is  the  lowest 
ordered  and  the  stem  collates  below  it.  Then,  clearly,  the  stem  is  not  in 
the  dictionary  and  so  the  associated  concept  code  and  weight  positions  in  the 
Present  Table  are  set  to  zero  and  the  function  returns  with  a  FALSE  value. 
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Once  the  correct  segment  is  found,  it  is  searched  using  a  binary 
technique.  It  is  necessary  to  establish  a  search  starting  point,  and  so  first 
the  smallest  power  of  two  not  less  than  the  number  of  entries  in  the  present 
dictionary  segment  is  computed.  This  step  is  omitted  if  the  previous  binary 
search  operated  on  a  segment  of  length  equal  to  the  present  segment  length; 
in  practice  all  segments  but  the  last  one  are  of  equal  length  owing  to  the 
characteristics  of  DICGEN,  the  dictionary  generation  program. 

One -half  of  the  value  (smallest  power  of  two  not  less  than  the 
size  of  the  present  segment)  is  used  for  the  starting  location  of  the  search. 
After  an  unsuccessful  comparison,  a  distance  is  either  added  to  or  subtracted 
from  the  starting  location,  according  as  the  search  found  a  stem  above  or 
below  the  desired  stem.  The  distance  is  initially  one-half  of  the  starting 
location,  and  is  of  course  halved  after  its  application.  If  the  distance  is 
reduced  to  zero,  the  stem  desired  is  not  in  the  dictionary  and  so  the  corres¬ 
ponding  concept  code-weight  pairs  are  set  to  zero  and  the  function  returns 
with  a  value  of  FALSE. 


IV.  4  SUBROUTINE  NUMBER  (AND  ASSOCIATED  FUNCTION  ZORCH) 

IV.  4. 1  Purpose 

As  described  in  the  On-Line  Retrieval  Interim  Report  (5),  this 
routine  reads  document  specifications  from  the  remote  terminal  in  a 
variety  of  forms,  and  returns  document  accession  numbers  and  a  status 
indicator. 

IV.  4.  2  Action 


The  subroutine  acts  as  described  in  the  Interim  Report,  with 
three  exceptions; 
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Figure  IV - 5  (coat'd) 
Subroutine  NUMBER 


Figure  IV-5  (cont'd) 
Subroutine  NUMBER 
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Figure  IV-5  (concluded) 
Subroutine  NUMBER 
IV -17 


1.  The  Interim  Report  described  a  version  of  the  routine 
which  would  print  an  error  message  at  the  remote  console 
and  request  corrected  input  in  the  event  that  a  non-existant 
document  is  specified.  In  the  testing  of  NUMBER,  it 
became  apparent  that  this  is  not  desirable  in  the  case  where 
a  range  specification  of  temporary  document  identification 
numbers  is  used.  For  example,  a  specification  of  all  docu¬ 
ments  with  temporary  numbers  in  the  range  between  thirteen 
and  thirty  ("13-3  0")  would  be  reasonable  even  in  the  event 
that  document  number  twenty-five  had  previously  been 
deleted  from  the  temporary  file.  The  user  should  not  be 
forced  to  specify  "13-24"  and  "26-30".  The  routine  accepts 
such  range  specifications  even  though  not  all  documents 
exist.  However,  it  will  not  accept  a  specification  if  none  of 
the  specified  documents  are  present  in  the  temporary  file. 

2.  Leading  zeros  do  not  have  to  be  provided  with  accession 
number  specifications:  "A17,  "  "A017"  and  "A00017"  are 
all  accepted  as  identifying  the  document  whose  accession 
number  is  seventeen. 

3.  When  the  user  is  asked  if  he  wishes  to  specify  additional 
documents,  a  reply  of  "OPTIONS"  is  treated  t^e  same  way 
as  a  reply  of  "NO". 

Leading  and  trailing  blanks  are  allowed  in  the  specification. 


IV.  4.  3  Method 


The  subroutine  uses  PLUCK  to  read  delimited  strings  from  the 
remote  console,  as  that  mav  be  treated  as  a  file.  YESNO  is  used  when  the 
user  is  askea  if  more  documents  are  to  be  snecified:  all  -neTsages  are 
printed  by  means  of  calls  to  OUT.  LENGTH  and  PITT  a~e  used  by  PLUCK, 
and  LENGTH  is  also  called  directly  when  the  input  specification  strings  are 
analyzed. 


In  order  to  avoid  conversion  problems,  the  transformation  from 
ASCII  to  internal  integer  representation  is  programmed  directly,  rather 
than  achieved  by  use  of  the  ENCODE/ DECODE  statements.  For  this  same 
reason,  a  small  function  ZOFCH  is  used  so  that  ASCII  characters  m0y  be 
handled  as  integers.  Since  ZORCH  is  required  for  this  reason,  it  is  con¬ 
venient  to  include  in  it  a  detection  of  non-numeric  characters. 


Except  as  noted  above,  the  logic  of  NUMBER  follows  the  des¬ 
cription  in  the  Interim  Report. 
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IV.  5 


SUBROUTINE  OUT 


IV. 5.1  Purpose 

To  print  standard  messages  at  the  remote  terminal. 
IV.  5.  2  Action 


A  call  to  OUT(J)  causes  standard  message  number  J  to  be  printed 
at  the  remote  terminal.  Recall  that  some  messages  exist  in  both  terse  and 
verbose  forms.  A  logical  parameter  in  common,  TERSE,  is  true  if  terse 
dialogue  is  desired.  The  verbose  form  of  the  message  is  printed  unless 
both  TERSE  is  true  and  message  number  J  possesses  a  terse  form.  In  the 
event  that  OUT(J)  is  called  with  a  value  of  J  corresponding  to  no  message 
number,  the  following  error  message  is  printed  and  control  returned  to  the 
calling  program: 

ERROR  IN  'OUT'  SUBROUTINE  AT  MESSAGE  #  nn 
where  "nn"  is  the  invalid  value  of  J  used  in  the  call. 

IV.  5.  3  Method 


For  a  description  of  the  format  of  file  MESSAGES,  see  section 

HI.  2.  5. 


IV.  6  SUBROUTINE  PLUCK 

IV. 6. 1  Purpose 

Subroutine  PLUCK  scans  text  files  and  returns  character  strings. 
Input  parameters  to  PLUCK  determine  the  delimiters  used  in  string  definition, 
the  file  to  be  searched  and  the  starting  point  in  that  file.  Output  parameters 
are  the  string  f^und,  its  length,  and  the  position  in  the  file  of  the  start  of  the 
next  scan.  The  position  data  may  be  saved  in  order  to  resume  searching  a 
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Figure  IV-6 
Subroutine  OUT 
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Output  from  file  CHICKEN 
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F igure  IV - 8 

Demonstration  of  PLUCK 


file  following  a  search  of  another  file.  If  it  is  known  that  no  intermediate 
activity  will  change  the  status  of  the  first  file,  a  special  call  to  PLUCK  can 
be  made  in  order  to  avoid  initialization  after  a  switch  from  one  file  to  another 
and  back  to  the  first. 

In  order  to  illustrate  the  workings  of  PLUCK,  its  listing  here 
includes  a  test  driver  that  reads  from  files  CHICKEN  and  LENGTHFN  (which 
contains  function  LENGTH).  The  driver  contains  comment  lines  that  indicate 
the  purpose  of  the  calls  to  PLUCK,  and  Figure  IV- 8  shows  the  output  obtained 
More  extensive  tests  were  performed  than  those  shown  here. 

IV.  6.  2  Action 


In  a  call  to  PLUCK  (FILE,  ARRAY,  L,  I,  R,  P),  the  parameters 
have  the  following  meaning: 

FILE  is  a  filename  constant  or  variable,  indicating  the  name  of 
the  file  to  be  read.  It  is  not  required  that  the  file  be  line- 
numbered.  Unless  either  the  last  call  to  PLUCK  obtained  data 
from  the  same  file  or  contained  information  on  the  file's  status 
(see  the  case  where  1=0,  below),  the  file  will  be  rewound  and 
repositioned  upon  a  call. 

ARRAY  contains  the  string  found,  up  to  72  characters  left- justified 
in  ASCII  format  and  filled  out  with  blanks. 

L  contains  the  number  of  characters  in  the  string  found.  If  the 
end  of  the  file  is  reached  L=0. 

R  is  both  an  input  and  output  parameter.  In  a  call  it  indicates  the 
sequential  number. (starting  at  one)  of  the  first  record  to  be 
searched.  On  output  it  indicates  the  number  of  the  record  contain¬ 
ing  the  string  found,  or  the  next  record  if  the  string  found  was  the 
last  one  in  a  record.  If  the  end  of  file  is  reached,  R  =  0. 
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P  is  like  R,  except  that  it  indicates  the  next  character  position 
to  be  searched  within  the  record.  If  the  end  of  file  is  reached, 

P  =  0  If  one  of  the  calling  parameters  is  outside  the  legal  limits, 
P  =  -100.  It  is  assumed  that  the  maximum  length  of  a  record  is 
72  characters,  but  this  constraint  is  easily  changed. 

I  is  the  input  parameter  which  controls  the  action  of  PLUCK  in 
the  selection  of  delimiters  and  the  control  of  file  initialization. 


1=0.  This  call  dees  not  return  a  string.  It  is  used  when 
switching  from  the  file  last  referenced  back  to  a  file 
referenced  previously  where  the  last  value  of  R  is  known. 

It  backspaces  the  previous  file  one  record,  and  reads  that 
record  so  that  any  following  calls  to  that  file  specifying  a 
record  number  R  or  greater  do  not  require  a  complete  file 
rewind  and  reread  to  record  R. 

1=1.  The  string  starting  with  or  following  the  character 
number  P  in  record  R  is  obtained.  The  only  recognized 
delimiters  are  the  space  and  end  of  record. 

1=2.  Like  the  case  with  1=1,  except  that  comma,  period, 
double  quote,  exclamation  point,  colon,  semicolon,  right 
and  left  parenthesis  and  question  mark  are  recognized  as 
delimiters  in  addition  to  space. 

1=3.  Like  1=2,  except  that  the  dash  is  included  as  a  delim¬ 
iter. 

I=-l.  Like  1=1,  except  that  the  first  string  of  a  record  is 
ignored.  This  is  for  use  with  line-numbered  files. 

I  =  - 2 ;  I=-3.  Like  1  =  2  and  1  =  3,  respectively,  except  for  line- 
numbered  files. 


IV.  6.  3  Method 


Figure  IV-7  shows  a  flowchart  for  subroutine  PLUCK;  Figure 
IV-8  is  a  listing  of  the  program  and  a  demonstration  driver. 

The  use  of  the  variables  FILE  1  and  FILE  2  is  not  obvious. 
Certain  versions  of  the  compiler  have  shown  errors  in  the  handling  of  IF 
statements  dealing  with  filename  variables,  but  all  versions  allow  the 
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replacement  statement  to  have  filename  variables  on  the  right  and  ASCII 
variables  on  the  left.  If  difficulties  are  encountered  with  the  IF  statement, 
then  FILE  1  and  FILE  2  may  be  declared  ASCII  rather  than  FILENAME,  ^his 
will  overcome  the  compiler  difficulties,  but  it  must  be  remembered  that  then 
only  the  leftmost  four  characters  of  the  file  names  will  be  compared.  PLUCK 
calls  on  the  function  PUT,  described  below. 


IV.  7  FUNCTION  PUT 

IV. 7. 1  Purpose 

Function  PUT  is  used  by  PLUCK  to  pack  single  characters  into 
words.  Since  it  provides  a  gene  rally  useful  capability,  it  has  been  written 
as  an  integer  function  rather  than  as  part  of  Function  PLUCK. 

IV.  7.  2  Action 


Integer  function  PUT  has  three  parameters:  a  72  character 
array  ARRAY,  an  integer  L  and  a  single  character  A.  A  call  to  PUT  (A,  L, 
ARRAY)  put 8  character  A  into  position  L+l  of  the  18  word  array  ARRAY,  and 
returns  a  function  value  of  L+l.  The  character  in  A  must  be  left- justified 
and  filled  with  blanks,  and  ARRAY  must  similarly  contain  blanks. 


IV.  8  SUBROUTINE  STEM 

IV. 8. 1  Purpose 

Subroutine  STEM  performs  stem  analysis  and  common  word 

detection. 


The  degree  of  suffix  removal  is  governed  by  a  single  constant 
(Line  79750)  of  STEM.  The  desirable  value  of  this  constant  was  anticipated 
to  lie  between  four  and  six,  as  stated  in  the  Interim  Report.  Five  has  been 
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Figure  IV-9  (concluded) 
Subroutine  STEM 


selected  for  the  present  time,  but  of  course  it  can  be  revised  in  accordance 
with  future  experience.  Note  that  stems  of  fewer  than  five  characters  can  be 
generated  by  the  special  cases  which  remove  a  terminal  "i"  after  ordinary 
removal  of  "ly"  and  which  remove  the  second  of  a  double  terminal  consonant. 
For  a  discussion  of  this  and  other  aspects  of  the  performance  of  STEM,  see 
subsection  III.  2.  3  of  this  report. 

IV.  8.  2  Action 


STEM  (WORD,  STEMRD,  COUNT)  has  a  single  input,  an  array 
of  up  to  20  characters  named  WORD.  The  subroutine  returns  with  WORD 
unchanged,  the  stem  found  in  STEMRD  (another  array  of  up  to  20  characters) 
and  the  length  of  the  stem  in  COUNT.  If  the  input  word  is  common,  STEMRD 
contains  blanks  and  COUNT=0. 

STEM  requires  a  file  WORDS,  in  the  format  described  above, 
containing  up  to  250  common  words  and  up  to  50  suffixes  each  of  length  one, 
two,  three  and  four.  If  a  format  error  is  found  in  that  file,  an  error  message 
is  printed.  Processing  continues  using  the  part  of  the  file  read.  Of  course, 
certain  invalid  file  data  can  cause  a  TSS  system  abort,  over  which  the  Dialogue 
Processor  has  no  control. 

IV.  8.  3  Method 

The  common  words  and  stems  are  stored  in  a  separate  file, 
WORDS.  This  is  done  in  order  that  they  may  be  modified  without  altering 
and/or  recompiling  STEM.  This  file  is  read  and  stored  by  STEM  upon  the 
first  call  to  STEM,  and  then  the  file  is  closed. 

Stem  analysis  is  performed  as  described  in  the  Interim  Report. 
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IV.  8.  4  Example 


A  short  program  was  written  to  read  from  file  CHICKEN,  with 
contents  as  shown  in  Figure  IV-lO(a),  using  subroutine  PLUCK  and  placing 
the  results  in  an  eighteen  word  array  FEATHERS.  STEM  was  called  with 
FEATHERS  as  its  argument,  resulting  in  the  output  shown  in  Figure  IV-lO(b). 
Each  asterisk  indicates  a  rejected  common  word. 

The  control  parameter  of  PLUCK  was  set  to  +3,  and  functions 
LENGTH  and  PUT  were  of  course  also  loaded.  It  should  be  noted  that  the 
use  of  FEATHERS  as  output  of  PLUCK  and  input  to  STEM  is  perfectly  per- 
missible,  even  though  the  dimensioning  statements  within  those  subroutines 
are  different. 


IV.  9  SUBROUTINE  WHERE 

IV.  9-  1  Purpose  and  Action 

In  the  degugging  of  a  complicated  program  such  as  the  Dialogue 
Processor,  the  programmer  is  frequently  faced  with  the  problem  of  deter¬ 
mining  the  path  of  control  through  the  program.  This  subroutine  is  designed 
to  aid  in  that  determination.  Calls  are  of  the  form  CALL  WHERE  (A,  N), 
where  A  is  an  ASCII  constant  and  N  is  an  integer.  The  subroutine  responds 
by  printing  the  values  of  A  and  N.  Successive  calls  produce  printing  of  A 
and  N  on  one  line,  until  that  line  is  filled  then  a  new  line  is  started.  There 
are  two  exception  to  this: 


1.  As  debugging  progresses,  the  programmer  may  wish  to 

turn  off  the  action  of  WHERE.  Therefore,  when  it  is  first 
called,  it  prints  "ACTIVATE  TRACE?"  An  answer  of  "NO" 
will  suppress  all  printing  by  WHERE;  when  called  it  will 
immediately  transfer  control  back  to  the  calling  program. 
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00  THE  SURVEY  SH0WS  THAT .  WHILE  THE  RESULTS  ARE  NJT  I NEXPL c CAELY 
01  C0MTRADI CT0KY »  DIFFERENCES  IN  PRINCIPLE  AND  METHOD  MAKE  IT  IMP0SSIBE 
02  T'3  DEMONSTRATE  CERTAIN/  CLOSE  AGREEMENT.  THE  AUTHOR  SUGGESTS  THAT 
03  FUTURE  SURVEYS  SHOULD  BE  DESIGNED  TO  INCLUDE  A  FEW  FEATURES  WITH  A 
04  DELIBERATE  RELATIONSHIP  T0  EARLIER  SURVEYS  S3  THAT  SOME  VALID 
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Figure  IV-lO(b) 
Results  of  STEM 


Figure  IV-ll  (cont'd) 
WHERE 
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2. 


In  a  loop  resulting  in  repeated  calls  to  WHERE  with 
unchanging  arguments,  repeated  printing  of  the  arguments 
would  be  unnecessary,  wasteful  and  annoying.  Therefore 
the  arguments  are  printed  once  upon  entry  to  the  loop,  and 
upon  exit  those  arguments  are  followed  by:  "*kk/",  where 
kk  is  the  number  of  times  the  loop  was  executed. 


IV.  9.  2  Method 


The  flowchart  (Figure  IV -11)  and  listing  (Figure  VI- 9)  explain 
this  straightforward  subroutine. 


IV.  9.  3  Deactivation 


In  the  code  delivered  to  RADC,  the  subroutine  has  been  altered 
so  that  the  query  "Activate  Trace?"  is  not  printed  and  no  trace  is  supplied. 
This  is  done  by  adding  two  lines  to  WHERE: 


99035  FLAG  =  .  FALSE.  ;  PRINT  500 

99036  500  FORMAT  (2H  &  ©) 


IV.  10  FUNCTION  YESNO(I) 

Many  system-generated  queries  must  be  answered  either  "yes" 
or  "no".  This  subroutine  reads  a  string  from  the  remote  terminal  and  sets  its 
arguments  to  one  if  "yes"  was  read  or  zero  if  "no"  was  read.  The  sophisti¬ 
cated  user  is  allowed  the  word  "options",  which  sets  the  argument  to  minus 
one;  any  other  response  causes  the  system  to  ask  the  user  to  'ANSWER 
"YES"  OR  "NO".  and  repeats  the  query. 

In  many  applications  it  is  useful  to  call  such  a  routine  as  an 
arithmetic  function,  so  that  the  statement 

IF  ( YESNO(I)  )  1,  2,  3 

branches  to  1  for  "options",  to  2  for  "no"  and  to  3  for  "yes". 
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SECTION  V 


FURTHER  WORK 

The  work  performed  on  this  project  has  resulted  in  the  design 
and  partial  implementation  of  an  on-line  system  that  promises  to  add  new 
important  capabilities  to  on-line  information  retrieval  systems.  For  this 
promise  to  be  fulfilled,  the  investigation  and  implementation  effort  should 
be  continued. 

Of  first  importance  is  the  implementation  of  the  complete  On- 
Line  System  itself.  Once  the  System  is  implemented,  it  will  provide  the 
best  possible  tool  for  experimentation  with  the  techniques  of  automatic 
indexing  within  an  on-line  environment.  Estimates  can  be  made  of  such 
factors  as  expected  response  time,  precision,  and  relevance,  and  valuable 
experimentation  can  be  conducted  using  batch  programs.  However,  the 
combined  impact  upon  the  user  of  all  these  factors  can  be  determined  only 
by  constructing  the  entire  System.  And  it  is  this  total  impact  on  the  user 
that  determines  the  utility  of  these  techniques. 

Simultaneously  with  the  implementation,  experiments  should  be 
performed  to  determine  optimum  settings  for  the  various  features  of  the 
indexing  programs  that  can  be  varied  parametrically.  The  value  of  such 
experimentation  is  vividly  illustrated  by  the  discussion  of  the  dictionary 
of  900  stems  (Figure  III  -  4  >  in  subsection  III.  2.3.  By  studying  the  effect  of 
variation  of  these  various  factors  upon  the  results  obtained,  the  System  can 
be  "tuned"  to  maximize  its  performance. 
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SECTION  VI 


LISTINGS 


This  Section  contains  listings  of  all  programs  and  subprograms 
that  comprise  the  dialogue  processor  and  its  associated  supporting  software. 
The  Directory  of  Programs  and  Subprograms  in  Figure  IV-1  serves  to  index 
this  Section. 
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Figure  VI-1  Function  DOCK  (concluded) 
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Figure  VI-2  Function  LENGTH  (cont'd) 
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Figure  VI- 2  Function  LENGTH  (concluded) 
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Figure  VI-3  Function  LOOKUP  (cont'd) 
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Figure  VI- 3  Function  LOOKUP  (concluded) 
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Figure  VI-4  Subroutine  NUMBER 

Function  ZORCH  (concluded) 
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Figure  VI-5  Subroutine  OUT 
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Figure  VI-6  Subroutine  PLUCK  (concluded) 
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Figure  VI-7  Function  PUT 
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Figure  VI- 8  Subroutine  STEM  (cont’d) 
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Figure  VI- 8  Subroutine  STEM  (concluded) 
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Figure  VI-9  Subroutine  WHERE 
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Figure  VI-11  Program  DIALOGUE  (concluded) 
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Figure  VI-12  Program  CONGRA  (concluded) 
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Figure  VI-13  Program  DICGEN  (concluded) 
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3  ABSTRACT 


This  report  is  concerned  with  the  implementation  of  an  on-line  information 
storage  and  retrieval  system  for  the  Rome  Air  Development  Center.  This  system  is 
to  incorporate  techniques  of  automatic  document  classification  for  a  large  document 
collection  in  an  interactive  environment.  Following  a  review  of  the  system  design, 
the  implementation  of  the  system  executive  is  described  in  detail.  Because  this 
executive  program  also  governs  communications  between  the  user  and  the  system,  it 
must  be  a  communications  package, a  training  aid,  a  file  building  program,  and  an 
executive  program  all  in  one. 
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ERRATA 


The  reader  of  this  report  should  be  aware  of  the 
contributions  made  by  Dr.  Gerard  Salton  of  the  Dept,  of 
Computer  Sciences  at  Cornell  University  through  his 
research  in  information  storage  and  retrieval,  particularly 
the  SMART  document  retrieval  system.  Many  of  the  notions 
and  methodologies  concerned  with  document  retrieval 
described  in  this  report  are  attributable  to  him.  For 
example,  the  concept  vector  technique  as  used  in  this  system, 
and  fundamental  to  it,  should  be  credited  to  Dr.  Salton  as 
well  as  the  techniques  used  for  document -docume.i L  anci 
query-document  correlation.  Publications  authored  by 
Dr.  Salton  and  his  students  ( 1 A ,  2A,  3A,  4A,  b A )  were 
consulted  and  significantly  influenced  the  overall  system 
design.  The  reader  is  encouraged  to  refer  to  these  documents 
as  well  as  to  RADC-TR-69-304  ( G A)  for  further  details. 
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